PyEllips / pyElli

An open source ellipsometry analysis tool for reproducible and comprehensible building of optical models.
https://pyelli.readthedocs.io
GNU General Public License v3.0
17 stars 6 forks source link

Access to data is index-wise, not wavelength-wise #152

Closed ortrs closed 1 year ago

ortrs commented 1 year ago

I am trying to access the data from a .dat file by using pyElli but I encounter a few bugs:

Code excerpt:

import elli
from elli.fitting import ParamsHist, fit

psi_delta = elli.read_woollam_psi_delta("glass na za 300 1000 nm 10 nm stp depol+iso.log")

My data has \lambda = {300-1000} nm (step 10nm) and \phi = [55,56,57,58] ° (4 angles available)

I can access through psi_delta.loc for several points, but there are a few I cannot:

psi_delta.loc[55][300:800]

Does not return the data, rather

Empty DataFrame
Columns: [Ψ, Δ]
Index: []

As noted in the https://github.com/PyEllips/pyElli/blob/master/examples/Basic%20Usage/Basic%20Usage.ipynb basic usage notebook, should return the values of [Ψ, Δ].

The following access works correctly:

psi_delta.loc[55][0:50]

It seems that data can be accessed through index, not through wavelength for woollam-like datasets.

Also trying to access inexisting keys raises an uncaught error:

>>> elli.read_woollam_psi_delta("glass na za 300 1000 nm 10 nm stp depol+iso.log").loc[55][500]
Traceback (most recent call last):
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\base.py", line 3653, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas\_libs\index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 500

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\frame.py", line 3761, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\base.py", line 3655, in get_loc
    raise KeyError(key) from err
KeyError: 500

Same for unexisting angles:

>>> elli.read_woollam_psi_delta("glass na za 300 1000 nm 10 nm stp depol+iso.log").loc[70][500]
Traceback (most recent call last):
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\base.py", line 3653, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas\_libs\index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1698, in pandas._libs.hashtable.Float64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1722, in pandas._libs.hashtable.Float64HashTable.get_item
KeyError: 70.0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexing.py", line 1103, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexing.py", line 1343, in _getitem_axis
    return self._get_label(key, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexing.py", line 1293, in _get_label
    return self.obj.xs(label, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\generic.py", line 4088, in xs
    loc, new_index = index._get_loc_level(key, level=0)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\multi.py", line 3059, in _get_loc_level
    indexer = self._get_level_indexer(key, level=level)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\multi.py", line 3160, in _get_level_indexer
    idx = self._get_loc_single_level_index(level_index, key)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\multi.py", line 2752, in _get_loc_single_level_index
    return level_index.get_loc(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\base.py", line 3655, in get_loc
    raise KeyError(key) from err
KeyError: 70

Probably connected to the fact that elements cannot be accessed as floats:

>>> elli.read_woollam_psi_delta("glass na za 300 1000 nm 10 nm stp depol+iso.log").loc[57.0][29.0:30.0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\frame.py", line 3735, in __getitem__
    indexer = self.index._convert_slice_indexer(key, kind="getitem")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\base.py", line 4132, in _convert_slice_indexer
    indexer = self.slice_indexer(start, stop, step)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\base.py", line 6344, in slice_indexer
    start_slice, end_slice = self.slice_locs(start, end, step=step)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\base.py", line 6537, in slice_locs
    start_slice = self.get_slice_bound(start, "left")
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\base.py", line 6452, in get_slice_bound
    label = self._maybe_cast_slice_bound(label, side)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\base.py", line 6406, in _maybe_cast_slice_bound
    self._raise_invalid_indexer("slice", label)
  File "C:\env\ellip\.env\Lib\site-packages\pandas\core\indexes\base.py", line 4152, in _raise_invalid_indexer
    raise TypeError(msg)
TypeError: cannot do slice indexing on Index with these indexers [29.0] of type float 

Suggestions for update:

>>> elli.read_woollam_psi_delta("glass na za 300 1000 nm 10 nm stp depol+iso.log").loc[55][500]
                   Ψ          Δ
Wavelength
600.000000  3.102291  183.40475

>>> elli.read_woollam_psi_delta("glass na za 300 1000 nm 10 nm stp depol+iso.log").loc[59][500]
Warning: Angle 59° is not available.
Available angles: [55,56,57,58]

#There is no 1100nm in the measurement
>>> elli.read_woollam_psi_delta("glass na za 300 1000 nm 10 nm stp depol+iso.log").loc[55][1100]
Warning: Wavelength 1100 nm is not available.
Available wavelengths: [300 - 1000] nm

Unit for x-axis should be available in the log itself (I can also try to change the units for an upcoming measurement to make sure that this is possible (and available for other types of data read).

glass na za 300 1000 nm 10 nm stp depol+iso.log

domna commented 1 year ago

Hey @ortrs,

thank you for your issue and the detailed bug report.

I am trying to access the data from a .dat file by using pyElli but I encounter a few bugs:

Code excerpt:

import elli
from elli.fitting import ParamsHist, fit

psi_delta = elli.read_woollam_psi_delta("glass na za 300 1000 nm 10 nm stp depol+iso.log")

This is a bug, because your file contains additional dpolE rows instead of just the wavelength axis and the importer expects only numeric values in this column. I did a quick hotfix and if you like you can try it with pip install git+https://github.com/PyEllips/pyElli.git@fix-wvase-importer. Could you give it a quick test if it works well for you?

My data has \lambda = {300-1000} nm (step 10nm) and \phi = [55,56,57,58] ° (4 angles available)

I can access through psi_delta.loc for several points, but there are a few I cannot:

psi_delta.loc[55][300:800]

Does not return the data, rather

Empty DataFrame
Columns: [Ψ, Δ]
Index: []

Yeah, I'm not entirely happy with accessing this by numeric values (because there may be small uncertainties) and we will refactor the whole data structure soon to interface this more conveniently. However, you could use a simple workaround by using a small region around the value, i.e., using psi_delta.loc[54.9:55.1] to account for slight numeric deviations from the exact value.

Regarding the error message changes you're suggesting we cannot do much right now, because they are coming from pandas and not from pyElli. However, I agree that this should be reported cleaner and we will take this into account when refactoring the data structure.

ortrs commented 1 year ago

This is a bug, because your file contains additional dpolE rows instead of just the wavelength axis and the importer expects only numeric values in this column. I did a quick hotfix and if you like you can try it with pip install git+https://github.com/PyEllips/pyElli.git@fix-wvase-importer. Could you give it a quick test if it works well for you?

Works perfect now!

This is a bug, because your file contains additional dpolE rows instead of just the wavelength axis and the importer expects only numeric values in this column.

That is something I also noticed and that is because Woollam ellipsometer (the VASE model at least) can measure depolarization for modeling non-linearities (https://www.researchgate.net/profile/Shawana_Tabassum/post/Ellipsometry_How_to_get_n_and_d_from_Psi_and_Delta/attachment/5b27cc0bb53d2f63c3d1c11a/AS%3A638890587193344%401529334794039/download/a+short+course+in+ellipsometry.pdf, chapter 6-6, pp.152 (Adobe page index). I am still trying to find a way to post-process this data but it may be a whole module in itself.

You may have already seen that you can find it where the "NaN" start on the first and last columns (very uncomfortable to process), but I guess you already figured that out. :- )

The bug is resolved on my end, thanks!

domna commented 1 year ago

This is a bug, because your file contains additional dpolE rows instead of just the wavelength axis and the importer expects only numeric values in this column. I did a quick hotfix and if you like you can try it with pip install git+https://github.com/PyEllips/pyElli.git@fix-wvase-importer. Could you give it a quick test if it works well for you?

Works perfect now!

Nice, I still would like to refine the code a bit, before I merge it. I'm not sure if I have time to do it before I'm in holidays. Otherwise, you just have to use this certain branch to work with your data and not the official version until I'm back (4th September). Is this fine?

This is a bug, because your file contains additional dpolE rows instead of just the wavelength axis and the importer expects only numeric values in this column.

That is something I also noticed and that is because Woollam ellipsometer (the VASE model at least) can measure depolarization for modeling non-linearities (https://www.researchgate.net/profile/Shawana_Tabassum/post/Ellipsometry_How_to_get_n_and_d_from_Psi_and_Delta/attachment/5b27cc0bb53d2f63c3d1c11a/AS%3A638890587193344%401529334794039/download/a+short+course+in+ellipsometry.pdf, chapter 6-6, pp.152 (Adobe page index). I am still trying to find a way to post-process this data but it may be a whole module in itself.

You may have already seen that you can find it where the "NaN" start on the first and last columns (very uncomfortable to process), but I guess you already figured that out. :- )

Yes, I'm aware that woollam is also able to collect depolarization data. I have also written a module to extract the depolarization from mueller matrices, you can find it here https://github.com/PyEllips/pyCloude (based on Cloude decomposition). For mueller matrices you can use it to build a coherent jones matrix and use these matrices to fit your data to it. I'm not sure how to do this for depolarization values for psi and delta, but I think it should certainly be possible. However, we don't have really examples or experience with this. But if you like to explore this I'm happy to help.

Edit: If you want to use this in error calculation we might take it into account in the ongoing discussion on dispersion errors, see #150

The bug is resolved on my end, thanks!

domna commented 1 year ago

I would also leave this open until this is merged in the main branch (and with this "officially" fixed). It will be automatically closed as soon as I merge.