Quasars / orange-spectroscopy

Other
52 stars 58 forks source link

[FIX] Multifile: fix a bug when wavenumbers differ for less than 1e-6 #660

Closed markotoplak closed 1 year ago

markotoplak commented 1 year ago

Fixes #659

The problem appears because wavenumbers are compared as numbers at one point and as strings at another (using "%f", so rounded).

If wavenumbers collide when converting to string, the required precision is computed and used instead. Some wavenumbers will there be more precise than other.

I tried to keep perfect backward compatibility - whatever could be read should be read exactly the same now.

markotoplak commented 1 year ago

@borondics, I think I fixed it. could you please test this PR with your data?

markotoplak commented 1 year ago

In the future, we could reuse parts of the code in other readers - the same code could help us decrease the number of decimals we use for features. But that could create compatibility problems for old workflows because widgets store feature names, so we'll need to be careful and at least consider that...

codecov-commenter commented 1 year ago

Codecov Report

Merging #660 (73489ac) into master (325cd2c) will decrease coverage by 0.03%. The diff coverage is 100.00%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master     #660      +/-   ##
==========================================
- Coverage   89.42%   89.39%   -0.03%     
==========================================
  Files          71       71              
  Lines       11845    11865      +20     
==========================================
+ Hits        10592    10607      +15     
- Misses       1253     1258       +5     
markotoplak commented 1 year ago

The current code should ensure unique wavenumbers. decimals_neeeded_for_unique_str should return the number of decimals that would construct unique strings.

Then, an additional decimal is added for increased precision, so floats 1.49 and 2.51 yield strings "1.5" and "2.5" instead of "1" and "3" (without the added decimal).

borondics commented 1 year ago

Using 73489ac the Multifile reader doesn't fail but the data is still not completely OK. If you check the Data Table, the WNs are still not equal enough probably and the two different length spectra go to different blocks.

image

Would it be possible to use the same columns for the WNs that are common and have "?" only for those that are missing in the shorter dataset?

markotoplak commented 1 year ago

This bug was that wns should not be equal but Multifile tried to make them equal at the name level. Now it does not.

What you report is expected and intentional behavior, which was done so to be as general as possible. And it is exactly how the widget would work before the fix.

To solve the problem of different blocks we would need to interpolate while reading the files, which for now we do not do. You can interpolate in a separate widget.

To do what you wish, we'd have to make a decision of what to interpolate to, at least, and give users options to use that automatic interpolation or not. I see the benefit of this in terms of speed and memory, but in your particular case Interpolate will suffice.