Closed praneethratna closed 1 year ago
These improvements are very cool!
Since you're adding sympy
as a dependency, how "heavy" a library is it? In other words, is it likely to create dependency resolution issues? I see the dependency tree sympy > mpmath > gmpy2
, where gmpy2 apparently uses a C library ("gmpy2 is an optimized, C-coded Python extension module ...").
Are there lighter-weight alternatives to sympy
that would meet our needs? sympy has tons of capabilities, and here I think we'd be using only a tiny subset.
These improvements are very cool!
Since you're adding
sympy
as a dependency, how "heavy" a library is it? In other words, is it likely to create dependency resolution issues? I see the dependency treesympy > mpmath > gmpy2
, where gmpy2 apparently uses a C library ("gmpy2 is an optimized, C-coded Python extension module ...").
I feel SymPy
is lightweight compared to other mathematical libraries found online. Also SymPy
has a hard dependency only on mpmath and I don't think it would create dependency resolution issues.
Are there lighter-weight alternatives to
sympy
that would meet our needs? sympy has tons of capabilities, and here I think we'd be using only a tiny subset.
I'm using SymPy
for parsing numbers from strings using it's inbuilt parser. I'm not sure of any other libraries which can be used to do this, but there might be some libraries to do this task exclusively?
Also SymPy has a hard dependency only on mpmath but I don't it would create dependency resolution issues.
Right! Scratch my comment about dependency resolution per se. I did see that the dependency was limited to mpmath > gmpy2, and we don't currently use those libraries. My apologies.
Right! Scratch my comment about dependency resolution per se. I did see that the dependency was limited to mpmath > gmpy2, and we don't currently use those libraries. My apologies.
No worries! I understand the clarification now. Since we don't use mpmath or gmpy2 libraries, these dependencies doesn't affect us. But we need to look at alternatives for SymPy
if that makes the package slow.
Since the equations are pretty predictable, one option would be to just do re
all the way?
Since the equations are pretty predictable, one option would be to just do
re
all the way?
@leewujung Do you mean we use a single regex expression to check for the validity of the whole frequency differencing criteria? Wouldn't that make the code a bit messy?
@leewujung Do you mean we use a single regex expression to check for the validity of the whole frequency differencing criteria? Wouldn't that make the code a bit messy?
I think it would be ok? For the freqAB case, the Hz/kHz/MHz/GHz
cases are fixed, so the number in front of that could be extracted. For the chanAB case, it is the string between the double quotes. The binary comparison operator >, <, >=, <=, ==
are the allowed cases. And then we need to extracted the number before dB
. This doesn't seem overly complicated?
I think it would be ok? For the freqAB case, the
Hz/kHz/MHz/GHz
cases are fixed, so the number in front of that could be extracted. For the chanAB case, it is the string between the double quotes. The binary comparison operator>, <, >=, <=, ==
are the allowed cases. And then we need to extracted the number beforedB
. This doesn't seem overly complicated?
@leewujung I think it shouldn't be overly complicated, it's just that i split everything into individual parts for better code debugging. I'll make these changes by today.
@leewujung Even if use a single re
expression to check for validity, we again need to use separate re
expressions to extract freq/chan values, operators and diff or am i understanding your point in a wrong way?
Hi @praneethratna : sorry for my super late response. What I am thinking in terms of using re
to grab out the pattern is like the following:
"(?P<freqA>\d+)(?P<unitA>\w?)Hz"
to match freqA and freqB, for example:
a = re.match("(?P<freq>\d+)\s*(?P<unit>\w?)Hz", "120kHz") # also matches arbitrary space between "120" and "kHz"
a["freq"] # get the number associated with the frequency
a["unit"] # get the unit associated with the frequency
then you can use a["unit"]
to determine the right multiplication factor to get to the numbers in frequency_nominal
, which is with units Hz (i.e. 120kHz is 120000 in frequency_nominal
).
freqA_pattern = "(?P<freqA>\d+)\s*(?P<unitA>\w?)Hz"
freqA_pattern = "(?P<freqB>\d+)\s*(?P<unitB>\w?)Hz"
db_pattern = "(?P<db>\d+)\s*dB"
cmp_pattern = "\s*(?P<cmp>\S+?)\s*"
DIFF_MATCHER = re.compile(freqA_pattern + "\s*-\s*" + freqB_pattern + cmp_pattern + db_pattern)
and then use outputs of the matched results with some error checking and handling (like converting str
to float
, k to 1000 etc) to combine with your other parts of the code.
What do you think?
Hi @praneethratna : sorry for my super late response. What I am thinking in terms of using
re
to grab out the pattern is like the following:
use patterns like
"(?P<freqA>\d+)(?P<unitA>\w?)Hz"
to match freqA and freqB, for example:a = re.match("(?P<freq>\d+)\s*(?P<unit>\w?)Hz", "120kHz") # also matches arbitrary space between "120" and "kHz" a["freq"] # get the number associated with the frequency a["unit"] # get the unit associated with the frequency
then you can use
a["unit"]
to determine the right multiplication factor to get to the numbers infrequency_nominal
, which is with units Hz (i.e. 120kHz is 120000 infrequency_nominal
).you can use the pattern above to build up the whole pattern, something like the following:
freqA_pattern = "(?P<freqA>\d+)\s*(?P<unitA>\w?)Hz" freqA_pattern = "(?P<freqB>\d+)\s*(?P<unitB>\w?)Hz" db_pattern = "(?P<db>\d+)\s*dB" cmp_pattern = "\s*(?P<cmp>\S+?)\s*" DIFF_MATCHER = re.compile(freqA_pattern + "\s*-\s*" + freqB_pattern + cmp_pattern + db_pattern)
and then use outputs of the matched results with some error checking and handling (like converting
str
tofloat
, k to 1000 etc) to combine with your other parts of the code.What do you think?
@leewujung No worries! I understand it now, these suggestions helped me to refactor the code in a much simpler and clean way.
@praneethratna : there are still places where you use sympy
in the code. Could you remove them completely and use re
instead? Thanks!
@leewujung I have removed the usage of sympy
completely and also squashed all the commits into one.
Also, I caught a bug / undesired behavior in the current _check_source_Sv_freq_diff
function. In the current implementation, the values of frequency_nominal
is not allowed to be duplicated:
https://github.com/OSOceanAcoustics/echopype/blob/254ee106eb580959ca85f2131168c4f17c0a285a/echopype/mask/api.py#L463-L467
However, it is possible that there are 2 channels (with different channel
strings) that have the same frequency_nominal
values (this is actually why we decided to use channel
as the coordinate, instead of frequency_nominal
, since coordinates cannot have duplicated values), and users may want to diff these 2 channels using the chanABEq
. I think that potentially duplicated frequency_nominal
values should only be checked if people use freqABEq
to put in their equation.
Looking at the code again, what do you think if we take the following approach to change this function?
# check that channel and frequency nominal are in source_Sv
intactfreqAB
checks are executed when people use freqABEq
, and only chanAB
checks are executed when people use chanABEq
# make sure that the channel and frequency_nominal values are not repeated in source_Sv
# check that the elements of freqAB are in frequency_nominal
# check that the elements of chanAB are in channel
ie we only check for frequency_nominal
to not be duplicated and elements of freqAB
are in frequency_nominal
when people use freqABEq
, and only check for channel
to not be duplicated and elements of chanAB
are in channel
when people use chanABEq
.
Ugh sorry just remember one thing. In here
https://github.com/OSOceanAcoustics/echopype/blob/254ee106eb580959ca85f2131168c4f17c0a285a/echopype/tests/utils/test_processinglevels_integration.py#L106
frequency_differencing
is called (this is something the CI does not catch, unless one uses [all tests ci] (see here). Could you quickly change it? Then I will merge. Thank you!
Merging #1106 (d8b8b26) into dev (6fc5a1d) will decrease coverage by
9.44%
. Report is 53 commits behind head on dev. The diff coverage is93.75%
.
@@ Coverage Diff @@
## dev #1106 +/- ##
==========================================
- Coverage 78.12% 68.69% -9.44%
==========================================
Files 65 8 -57
Lines 6227 658 -5569
==========================================
- Hits 4865 452 -4413
+ Misses 1362 206 -1156
Flag | Coverage Δ | |
---|---|---|
unittests | 68.69% <93.75%> (-9.44%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Files Changed | Coverage Δ | |
---|---|---|
echopype/mask/freq_diff.py | 93.44% <93.44%> (ø) |
|
echopype/mask/api.py | 91.66% <100.00%> (+4.16%) |
:arrow_up: |
... and 63 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Addresses #967. Now input for frequency_differencing can be given directly in the form of frequency differencing criteria instead of separating out freq/chan from operators.
CC @leewujung