Fix 443 dev - Githubissues

damhuonglan commented 3 years ago

What does this PR do?

fixing #443

What have been modified?

Commands: ph5tostationxml, ph5validate
Checking for response file inconsistency following 2 formats: metadatatoph5 format [das_model]_[sensor_model]_[sr][cha] and resp_load format [das_model]_[sr]_[srm]_[gain]. Each part in the format will be checked, except for [cha] because different channel codes can use the same response file. Requirement for the check is character _ has to be removed from model before added to the response file name in metadatatoph5 and resp_load command.
ph5tostationxml:
- In case of no response data can be returned because of error, not return empty response but throw error instead. Use flag -E/--emp_resp if want to add empty response for debugging
- In case of other bugs, not create stationxml. Use flag --stationxml_on_error to create stationxml if bug present in the data
- Multiprocessing has been removed because when checking subfolders of the given path, multiprocessing make the logging messages show up randomly. Using for loop instead help users recognize which one the messages are for. That way ph5tostationxml can inform when never stationxml has or has not been created for a ph5 data.

Checklist

[ ] This PR is not directly related to an existing issue (which has no PR yet).
[ ] All tests pass.
[ ] Any new or changed features have are documented.
[ ] Changes have been added to CHANGELOG.txt .
[ ] First time contributors have added your name to CONTRIBUTORS.txt .

damhuonglan commented 3 years ago

@timronan According to what you request to not return empty response, I fixed the old code so that it will:

Raise PH5toStationXMLError if
- (1) All response file names are blank in response_t
- (2) A station’s response_table_n_i has no n_i for it in response_t
- (3) response_t has n_i=-1 (this n_i value is created by metadata when there is no response signal. I feel uncomfortable to raise error for this because the current ph5/test_data/ph5/master.ph5 has this value in response_t)
- (4) response_file_das_a is blank for the checked n_i
- (5) There is no signal for either response_file_das_a or response_file_sensor_a (if response_file_sensor_a isn’t blank)
Log the warning message and return the response data according to the response paths indicated in response_t if:
- resonse_file_sensor_a is blank while model exists.
- resonse_file_sensor/das_a is inconsistent with model(s), sample rate, sample_rate_multiplier, gain or char_code.

I also added flag -E/—emp_resp that allow to return empty response for the cases 2,3,4,5 for the debugging purpose.

It passes the test case that you created. Please let me know if you agree with that. Thank you.

damhuonglan commented 3 years ago

@timronan I have run ph5tostationxml with fix_443_dev and still have Stage's information after InstrumentSensitivity's: 19-017.zip(fix_443_19_017.xml was created by ph5toxmlstation.py in fix_443 branch and fix_443_dev_19_017 was created by fix_443dev branch. They both have stages.) Can you pull the branch again to make sure it is updated and run one more time with different name for xml please? I'm afraid the last time your run it has the error and didn't create xml and you open the xml you created before the last run. (The only cases when I got InstrumentSinsitivity's without Stage's for me is when I use level=channel. Only with level=response it has Stage's information)_

timronan commented 3 years ago

@damhuonglan if I use the flag --response than stages are output that are consistent, based off a diff, with the original PH5toStationXML program. Could you make the output stationxml level be response as the default behavior? By default, ph5tostationxml should output stationxml at a response level which includes all of the stages.

damhuonglan commented 3 years ago

Why is this flag being added? What is it's use case?

Our DG sometimes creates sub-directories under the main one for testing. They don't want to include files under testing folder when running ph5tostationxml. To avoid that they have to create a completely different directory instead of sub-directory. If flag -T is used, only file directly under paths listed in flag -p will be processed. If flag -T isn't used, everything still work like normal. If you don't want that flag. I can revert that commit.

timronan commented 3 years ago

We can leave this flag, but as stated earlier, we need to be targeted in our PRs. This flag seems extraneous to this specific PR and should be associated with it's own issue and PR. We need to be careful about trying to do bulk fixes in PH5 PRs . We need to be able to identify and roll back issues without change significant amounts of code. Also, it is easy for multiple logical changes to be lost in code review.

damhuonglan commented 3 years ago

we need to be targeted in our PRs.

So I will revert it and do this in another PR.

damhuonglan commented 3 years ago

I am slightly concerned about changing PH5toStationxml and Validation on the same branch and in the same pull request.

validation.py is a module that keeps the the common checks between ph5validate and ph5tostationxml. To change the response check in ph5tostationxml, I have to change the functions in validation which results in the changes in ph5validate as well. That is the reason I have to change 3 of them as the same time. @timronan If you don't want to change all 3 files at the same time, let me know so I will move those functions to ph5tostationxml and test those functions in ph5tostationxml. If you think it is ok to keep it this way, the PR is ready for another review. Please review it again.

timronan commented 3 years ago

@damhuonglan point taken on validation.py. Thanks for the explanation, you should leave it how it is already set up. I will work on a reviewing code change. Thanks for working with me.

damhuonglan commented 3 years ago

https://github.com/PIC-IRIS/PH5/pull/445#discussion_r556189560 Examples for the warning from inconsistencies between response table and information in array table: https://github.com/PIC-IRIS/PH5/blob/ccf4017bd9eb8dd327a4bdf5d4dfe65802841288/ph5/clients/tests/test_ph5tostationxml.py#L382-L385 https://github.com/PIC-IRIS/PH5/blob/ccf4017bd9eb8dd327a4bdf5d4dfe65802841288/ph5/clients/tests/test_ph5tostationxml.py#L386-L389 https://github.com/PIC-IRIS/PH5/blob/ccf4017bd9eb8dd327a4bdf5d4dfe65802841288/ph5/clients/tests/test_ph5tostationxml.py#L392-L394

timronan commented 3 years ago

Could the Response_T Warnings and Errors that are output by ph5_validate be moved under the station and channels levels to which they belong. Right now ph5_validate outputs these warnings and error as a disorganized list under the Response_T header:

-=-=-=-=-=-=-=-=-
Response_t
0 error, 240 warning, 0 info
-=-=-=-=-=-=-=-=-
WARNING: array 001, station 13, channel 3, response_table_n_i 0: response_file_das_a 'ZLAND_3C' is inconsistence with sensor_model='' and das_model='ZLAND3C'; sr=250 srm=1 gain=18 'cha=DPZ'.
WARNING: array 002, station 33, channel 3, response_table_n_i 0: response_file_das_a 'ZLAND_3C' is inconsistence with sensor_model='' and das_model='ZLAND3C'; sr=250 srm=1 gain=18 'cha=DPZ'.
WARNING: array 002, station 56, channel 1, response_table_n_i 0: response_file_das_a 'ZLAND_3C' is inconsistence with sensor_model='' and das_model='ZLAND3C'; sr=250 srm=1 gain=18 'cha=DPN'.
WARNING: array 002, station 39, channel 2, response_table_n_i 0: response_file_das_a 'ZLAND_3C' is inconsistence with sensor_model='' and das_model='ZLAND3C'; sr=250 srm=1 gain=18 'cha=DPE'.
WARNING: array 001, station 26, channel 3, response_table_n_i 0: response_file_das_a 'ZLAND_3C' is inconsistence with sensor_model='' and das_model='ZLAND3C'; sr=250 srm=1 gain=18 'cha=DPZ'.
WARNING: array 001, station 10, channel 1, response_table_n_i 0: response_file_das_a 'ZLAND_3C' is inconsistence with sensor_model='' and das_model='ZLAND3C'; sr=250 srm=1 gain=18 'cha=DPN'.
WARNING: array 003, station 104, channel 1, response_table_n_i 0: response_file_das_a 'ZLAND_3C' is inconsistence with sensor_model='' and das_model='ZLAND3C'; sr=250 srm=1 gain=18 'cha=DPN'.

I am proposing that they should look similar too:

-=-=-=-=-=-=-=-=-
Station 1 Channel 2
0 error, 2 warning, 0 info
-=-=-=-=-=-=-=-=-
WARNING: No station description found.
WARNING: array 001, station 1, channel 2, response_table_n_i 0: response_file_das_a 'ZLAND_3C' is inconsistence with sensor_model='' and das_model='ZLAND3C'; sr=250 srm=1 gain=18 'cha=DPE'.
WARNING: Data exists after pickup time: 64 seconds.
-=-=-=-=-=-=-=-=-

This organization will make it easier to read the output from ph5_validate.

damhuonglan commented 3 years ago

Could the Response_T Warnings and Errors that are output by ph5_validate be moved under the station and channels levels to which they belong.

Let me try.

damhuonglan commented 3 years ago

@timronan I've removed the Response_T Warnings and Errors that are output by ph5_validate to be under the station and channels levels to which they belong. Please look at that to see if it meets your requirements.

timronan commented 3 years ago

@damhuonglan thanks for putting in this effort. I will do a review first thing next week 01/19/2021.

timronan commented 3 years ago

The warnings now seems to be included with the stations and channels to which they belong, but the warning messages are not exact, and they are a little bit confusing. Fore example: I receive the warning message:

WARNING: response_file_das_a 'ZLAND_3C' is inconsistent with sensor_model='' and das_model='ZLAND3C'; sr=250 srm=1 gain=18 'cha=DPE'.

For all channels in experiment 18-020. After doing a little bit of digging I found that ph5tokef --Response_t -n master.ph5 -p ./ returns response_file_das_a=/Experiment_g/Responses_g/ZLAND_3C ph5tokef --all_arrays -n master.ph5 -p ./ returns das/model_s=ZLAND_3C and ph5tokef --Sort_t -n master.ph5 -p ./ returns nothing.

The this warning is accurately being triggered but it is not directly telling us the problem. If fact the current message is telling us incorrect information saying that das_model='ZLAND3C'. It should say something like:

Station = ' ' Chan = ' ' Sort_T:sensor_model='' is inconsistent with Response_t:response_file_das_a='ZLAND_3C' and Array_T:'das/model_s='ZLAND_3C'

There should be a different message triggered by if else statements for each all of the combinations on differences between the tables. If this was made into a function we could potentially re-use in some of the other issues.

The output messages should take information directly from the metadata tables. It should not be hard coded because of potential differences as displayed above.

damhuonglan commented 3 years ago

The response_file_das_a must be in either respload format (`[das model][sample rate][sample rate multiplier][gain]) or metadatatoph5 format ([das model][sensor model][sample rate][channel code]`. According to the warning message, there is only das model in the response_file_das_a, which is wrong format.

If das model ='ZLAND', sensor model/ response_file_sensor_a is allowed to be blank.

Normally, I only see the the inconsistencies with different model, sample_rate, etc. I haven't seen the problem with wrong format like this. Should I add the format guideline in the warning message to prevent this situation? For example WARNING: response_file_das_a 'ZLAND_3C' is inconsistent with sensor_model='' and das_model='ZLAND3C'; sr=250 srm=1 gain=18 'cha=DPE'. Please check with format [das_model]_[sr]_[srm]_[gain] or [das_model]_[sensor_model]_[sr][cha].

timronan commented 3 years ago

The foreign key between the Response_t, Sorts_T and Array_T when it comes to DAS Models is Sorts_T:response_table_n_i==Array_T:response_table_n_i==Response_t:n_i.

Why does response_file_das_a need to be output as the resp_load format or the metadatatoph5 format? It seems like the n_i should be enough to properly identify the correct response. Am I missing something?

ascire-pic commented 3 years ago

I don't think it actually matters what format the response_file_das_a field is in, as long as it is populated with something valid, whether that is in the metadatatoph5 or resp_load format. I could think of a few cases where, after loading data with metadatatoph5, the array tables could then be edited to change the das or sensor model. That shouldn't be a problem as long as the n_i is pointing to a response.

ascire-pic commented 3 years ago

As an additional note, I think the experiment Tim is referencing is a bit older and therefore the DAS file name field is not as complete as the ones we started using later. Ideally, to identify a unique response, you would need at the minimum the instrument type (ex ZLAND_3C, RT130_CMG3T), the sample rate, the gain (particularly for the ZLand or SmartSolos) and maybe the channel name (Less important because in the majority of cases, all channels have the same response, just a different orientation).

So based on that, maybe we should add a warning like Lan mentioned to indicate that a user should check and make sure it is okay. If you only have one configuration/instrument, then just having ZLAND_3C as the das_file would be fine, but if you have multiple responses due to multiple configurations, it might be worth the flag to the user that they should check and make sure they actually loaded the correct response. With the less descriptive das_file_name field, there could be some uncertainty that a user loaded the correct response, so the flag might be useful.

I would just make it a warning. I don't think it's something that should cause writing out stationXML or anything to fail.

timronan commented 3 years ago

I thought this was an internal consistency check between the response tables and the array table but it seems that the check is between a set list of das models and what is contained in the metadata. Is that correct?

damhuonglan commented 3 years ago

I just simply checked if the response filename in response table match with the name may be created from resp_load or metadatatoph5 because I think those are the standard ways to create rows in the Response_t. The information I got to check was from array table.

However it seems not necessary. If we want to change that, we need to set up rules for what to check, and expect to see the changes in code and test files. Also, please give me the warning messages you want for each check when it's failed.

According to what Allisa said, I suggest the following rules:

response file name must be unique. (This will be checked by looking at response table only before looking at array table)
response file name must contain the model in the corresponding ni the array table. (Some minor changes are allowed for model: ',-=. ' can be added or removed from the model)
If you want to check for sample rate, gain, channel code in the filename, please add more to the rules. When we have the final rules, I will add them to the description of this PR then change the code for that.

Thank you for your contribution.

timronan commented 3 years ago

@damhuonglan and all I really like this approach and completely agree. We should have specific targeted fixes that are talked about through logical rules. Each PR should only be associated with a few logical rules or be associated with one or two test at most. This will help us keep our PR's targeted and concise. I think that that both of your proposed rules seem to encompass this PR.

response file name must be unique. (This will be checked by looking at response table only before looking at array table)

How will this uniqueness be checked? It seems like you could have multiple responses with the same DAS and Sensor name that are operating at different sample rates and gains. What do you think about

The [das model][sensor model][sample rate][sample rate multiplier][gain] for each response in the response_table should be compared to one another. If there are two responses with exact matches than a warning should be thrown. An error should be thrown that should say something similar to:

Response [n_i1] is a duplication of Response [n_i2]

response file name must contain the model in the corresponding ni the array table. (Some minor changes are allowed for model: ',-=. ' can be added or removed from the model) seem like the correct rules for this PR. These are also very testable.

If Array_T:response_table_n_i==Response_t:n_i THAN Response_T:response_file_das_a must equal Array:das/model_s else ERROR

Message:

[Station] [Channel] Array:das/model_s is inconsistent with Response_T:Response n_i. Please update these metadata.

I think these rules cover this PR but maybe missing something. What does everyone think?

damhuonglan commented 3 years ago

Can we discuss about this on the next PH5 meeting? Bruce wants us to talk more about this. He'd like to have some input too.

hrotman-pic commented 3 years ago

I have a question about the proposed sensor model consistency check. This is from fields populated using the metadatatoph5 format. In this example, the metadata is a stationXML created using Nexus. These fields have not been modified after running metadatatoph5.

Field in array table: sensor/model_s=L-28LB, 4.5 Hz, 27.0 V/m/s, Rc=395 Ohms, Rs=2490 Ohms

Field in response table: response_file_das_a=/Experiment_g/Responses_g/RT130_L28LB45Hz270VmsRc395OhmsRs2490Ohms_500DH2

Will this fail the check? If yes, please let me know if I should open an issue for it, or if we should wait until a PH5 meeting to discuss if it would be better to prevent it failing the check in the code or a different way. Also please let me know if you need more information.

damhuonglan commented 3 years ago

@hrotman-pic The following characters will be removed from sensor model before checking with response_file_dasa: ',' '-' '=' '.' '' ' '. So the sensor model will turn into: L28LB45Hz270V/m/sRc395OhmsRs2490Ohms. It has a little bit problem here is the '/' is not removed. But I can fix it. I think 500 is sample rate and DH2 is channel code. Correct? Thanks for giving this so I can fix my code. I will need you to check this after the fix.

timronan commented 3 years ago

response_file_das_a=/Experiment_g/Responses_g/RT130_L28LB45Hz270VmsRc395OhmsRs2490Ohms_500DH2

Should not be an acceptable. If a program is automatically creating response_file_das_a in this way, then the program needs to be modified. The need response_file_das_a field needs to be stripped all additional strings (units) of the special characters.

The check needs to be very specific:

respload format: [das model][sample rate][sample rate multiplier][gain] [string][int][int][int] metadatatoph5 format: [das model][sensor model][sample rate][channel code] [string][string][int][string]

If we start following a pattern of semi random strings there is no point for this validation check.

Realistically these formats should be consistent but that is a different PR.

damhuonglan commented 3 years ago

Should not be an acceptable. If a program is automatically creating response_file_das_a in this way, then the program needs to be modified. The need response_file_das_a field needs to be stripped all additional strings (units) of the special characters.

@timronan If you look at the field in array table sensor/model_s=L-28LB, 4.5 Hz, 27.0 V/m/s, Rc=395 Ohms, Rs=2490 Ohms that @hrotman-pic quote, the response_file_das_a=/Experiment_g/Responses_g/RT130_L28LB45Hz270VmsRc395OhmsRs2490Ohms_500DH2 is following the format metadatatoph5 format: [das model]_[sensor model]_[sample rate]_[channel code] if the following characters are omitted: ',' '-' '=' '.' '_' ' ' '/'.

So it should be acceptable if I add '/' in the list of characters to be ignored because others already in the list.

timronan commented 3 years ago

Alright, I get it. As @damhuonglan suggest, all of the special characters should be stripped from the check and this should work. I don't like it that:

Field in array table: sensor/model_s=L-28LB, 4.5 Hz, 27.0 V/m/s, Rc=395 Ohms, Rs=2490 Ohms

Is acceptable in the array table but I don't think there is anything we can do about that one.

hrotman-pic commented 3 years ago

@damhuonglan thank you for looking at my example and explaining. Do you want me to look at the other metadatatoph5 test archives I did a while ago and see if there are any additional special characters?

@timronan I don't like it being that way in the array table either. For example, the array kef in question cannot be converted to a csv (keftocsv fails). I have been considering opening an issue for metadatatoph5 to strip out commas during load (perhaps replaced with a field separator that won't cause problems?). Or, open an issue to have a discussion about how to best prevent things we don't want in the array tables when we load metadata with metadatatoph5 without removing potentially needed/useful information about the das or sensor. But, I'm not attached to being the one to start it, and there may be other problems with this example that I don't immediately see.

timronan commented 3 years ago

@hrotman-pic you should start a new issue about metadatatoph5. It should say something about throwing an error if special characters or spaces are trying to be translated from metadata into ph5. It shouldn't be allowed. As you said, this causes bugs throughout our code stack that and this is not clean data. The user should have to figure out how to describe the field they are trying to describe without spaces or special characters. Also, if we are going to touch metadatatoph5 you should roll issue #448 into the discussion. This is really not an enhancement and is actually a bug fix. Thanks for bringing this example to everyone's attention.

@damhuonglan strip all of the special characters and spaces and compare. That should be a sufficient work around for this PR. This PR is not associated with logic in metadatatoph5. They need to stay separated.

damhuonglan commented 3 years ago

@ascire-pic complained about the confusing error messages for blank sensor model when filename following metadatatoph5 format failed the check even for only one part of it. So I decided to:

split check_resp_file_name() into 3 different functions: check_metadatatoph5_format() check_das_resp_load_format() check_sensor() to be easier to process.
In check_metadatatoph5_format() check_das_resp_loadformat(), I split the filename into parts according to '' and try to check each parts acording to metadatatoph5 or resp_load format. That way the error message will be a lot more concise and it can decise the format of the filename better to avoid extra confusing error message

For example with the case response_file_das_a='NoneQ330_NoneCMG3T_100LHN'; das_model='rt125a'

The check in the old way will give 2 error messages:

array 003 station 0407, channel 1: Response_t[6]:response_file_sensor_a is blank while sensor model exists.

array 003 station 0407, channel 1: Response_t[6]:response_file_das_a 'NoneQ330_NoneCMG3T_100LHN' is incomplete or inconsistent with Array_t_003:sensor_model=NoneCMG3T Array_t_003:das_model=rt125a Array_t_003:sr=100 Array_t_003:srm=1 Array_t_003:gain=1 Array_t_003:cha=DPZ. Please check with format [das_model]_[sr]_[srm]_[gain] or response_file_sensor_a 'gs11v' is inconsistent with Array_t_009:sensor_model=cmg3t.

The check in the new way is only 1 message and more concise: array 003 station 0407, channel 1: Response_t[6]:response_file_das_a 'NoneQ330_NoneCMG3T_100LHN' is inconsistent with Array_t_003:das_model=rt125a. Please check with format [das_model]_[sensor_model]_[sr][cha].

For metadatatoph5 format, channel code check will be skipped because different channel code may use the same response file (according to @ascire-pic )
The reason I didn't use this approach before is because the model sections in the filename may contain _. So I will create a PR to strip off _ from model before adding to response filename in metadatatoph5 and resp_load commands.

@ascire-pic Please pull fix_443_dev again and check if it is the way you want. @timronan I know you don't want to change too much in this PR. If you think I only need to add special characters in the list and finish with this I will revert this change and ask @ascire-pic to create issue about the confusing messages and add this changes to the PR that fix that issue. But at first I would like to hear from you to see if it's worth to make these changes.

timronan commented 3 years ago

@damhuonglan I agree with at @ascire-pic the error message is too confusing. There should be functions to check both of the formats for historical metadata. As @damhuonglan stated the metadatatoph5 check should not check the channel codes. As was stated by @ascire-pic multiple channels can point to the same response.

For the channel code problem and other reasons, the metadatatoph5 function and resp_load function need to generate consistent response_file_das_a file names as is outlined and discussed in issue #448.

ascire-pic commented 3 years ago

I just ran ph5_validate and ph5tostationxml on the test archive I was using that added responses via metadatatoph5 and with the changes @damhuonglan made, the output makes more sense. The test archive did not throw any incorrect errors with respect to the das_file_name field and the stationXML file looks correct and complete. Originally ph5_validate and ph5tostationxml were complaining about the channel name in the das_file_name field which should not have been an issue. That error is no longer present.

timronan commented 3 years ago

What do we think about making this check to only verify that [das model][sensor model][sample rate] [String][String][Int] is consistent between the the check_resp_file_name() and the array_table. Although the Response_T[N]:response_file_das_a field will have to be parsed, these fields seem to be the most unique identifiers in the the respload and metadata2ph5 formats. This will set us up to work on issue #448. If the sensor model is missing from both fields than that field would be skipped.

damhuonglan commented 3 years ago

Furthermore, the new error message that is output:

ph5.clients.ph5tostationxml - ERROR: 002-0407-1 response_table_n_i 0: Response das file name should be 'NoneQ330_100_1_1' or 'NoneQ330_NoneCMG3T_100LHN' instead of 'NoneQ330_NoneCMG3T_200HHN'.

@timronan I don't know how you checked out and updated the PR. This message looks like the very old version, before we fix anything on this. Please check it. The new error message must be something like: array 002, station 0407, channel 1: Response_t[5]:response_file_das_a 'NoneQ330_CMG3T_200HHN' is inconsistent with Array_t_002:sensor_model=NoneCMG3T. Please check with format [das_model]_[sensor_model]_[sr][cha].

timronan commented 3 years ago

Thanks for the message. I'm checking again.

timronan commented 3 years ago

Okay, it seems like the branch was rolled back to commit 71b1462, the messaging looks a lot better but there are still some issues.

1: The actual value in the Array_T:das/model_s is inconsistent with the value that is output by the message. The message should not transform the field's content, it is confusing. In experiment 18-0202 the message states Array_t_001:das_model=ZLAND3C but if the array table is queried Array_t_001:das_model=ZLAND_3C. This needs to be resolved. The

2: Are we checking for the channel code or not? The message below is unclear when it comes to this idea. ERROR: array 002 station 0407, channel 1: Response_t[0]:response_file_das_a 'NoneQ330_NoneCMG3T_200HHN' is inconsistent with Array_t_002:sr=100. Please check with format [dasmodel][sensormodel][sr][cha]

3: Stationxml should not be output if an Error message is thrown in ph5tostationxml. It seems like the point of this pull request is that we are unsure of which Response_Table entry is associated with which channel.

4: The format's name need to be called out. The message needs to say to check either the Resp_Load or Metadata to ph5 format.

ascire-pic commented 3 years ago

I thought we were going to output stationxml even if the error occurred to avoid accidentally breaking any archives that already exist at the DMC?

timronan commented 3 years ago

In general, errors should stop a program's execution. For 3: in the list above, the metadata are stored separately from the PH5 data in the archive and if this error does not automatically output stationxml than the metadata in the archive should not be affected. To address the concern about outputting stationxml from historical potentially incorrect experiments, a flag, --stationxml_on_error or something similar, should be developed that outputs stationxml, following the original lph5tostationxml pattern, even if an error is thrown.

damhuonglan commented 3 years ago

@timronan so you want flag --stationxml_on_error to log error when there are errors. Otherwise, there are nothing will be logged?

timronan commented 3 years ago

@damhuonglan no. This error should always be thrown and logged regardless of flags. That is the point of this validation check, correct?

Stationxml files should only be output from ph5tostationxml if 1: this bug is not present in the data. Or the flag --stationxml_on_error is included in the function call.

timronan commented 3 years ago

This version is working really well and we are very close to the end of this PR, except we need to discuss what should be done about the test that @timronan just pushed, test_ph5tostationxml:test_Response_NI_DUPLICATION.

The response table, found at PH5/ph5/test_data/response_table_n_i_dup/Response_Table_Duplication.kef in this case has duplicated response n_i values. The --stationxml-on-error flag does not work for this test case. The master branch of ph5 outputs stationxml documents with a response table configured in this way so the flag --stationxml-on-error should also output stationxml from a PH5 expirment with this metadata configuration. It seems like taking a FIFO logical approach is the best way to handle this issue. The first response with a unque n_i value should likely be used.

damhuonglan commented 3 years ago

The --stationxml-on-error flag does not work for this test case.

Let look at your test case for duplicated response_n_i: https://github.com/PIC-IRIS/PH5/blob/b7dc1c32047ce6dca221c33a857275cf4a9013c6/ph5/clients/tests/test_ph5tostationxml.py#L585-L591

In line 591, you pass "--stationxml_on_error" as argument for getParser. However, in getParser() line 41, station_xml_on_error is always True: https://github.com/PIC-IRIS/PH5/blob/b7dc1c32047ce6dca221c33a857275cf4a9013c6/ph5/clients/tests/test_ph5tostationxml.py#L21-L44

The argument you passed is at the position of minlat, so minlat="--stationxml_on_error" which creates the error:

  File "/Users/field/Documents/GIT/PH5/ph5/core/ph5utils.py", line 142, in is_rect_intersection
    minlat) > float(latitude):
ValueError: could not convert string to float: --stationxml_on_error

I wrote getParser() to test the data return in case stationxml_on_error=True.

The below lines is to test if the flag stationxml_on_error work or not: https://github.com/PIC-IRIS/PH5/blob/b7dc1c32047ce6dca221c33a857275cf4a9013c6/ph5/clients/tests/test_ph5tostationxml.py#L123-L143

Then to fix your test case, just simply remove line 591

https://github.com/PIC-IRIS/PH5/blob/b7dc1c32047ce6dca221c33a857275cf4a9013c6/ph5/clients/tests/test_ph5tostationxml.py#L591

timronan commented 3 years ago

When I run: ph5tostationxml -n master.ph5 -p ./ both errors and the stationxml are output:

[2021-02-22 22:35:51,845] - ph5.clients.ph5tostationxml - ERROR: 006-6046-1 response_table_n_i 0: response_file_sensor_a is blank while sensor model exists.
[2021-02-22 22:35:51,845] - ph5.clients.ph5tostationxml - ERROR: 006-6047-1 response_table_n_i 0: Response das file name should be 'Geode_1000_1_12' or 'Geode_GS11D_1000GPZ' instead of 'ZLAND3C_1000_1_12'.
[2021-02-22 22:35:51,845] - ph5.clients.ph5tostationxml - ERROR: 006-6047-1 response_table_n_i 0: response_file_sensor_a is blank while sensor model exists.
[2021-02-22 22:35:51,845] - ph5.clients.ph5tostationxml - ERROR: 006-6048-1 response_table_n_i 0: Response das file name should be 'Geode_1000_1_12' or 'Geode_GS11D_1000GPZ' instead of 'ZLAND3C_1000_1_12'.
[2021-02-22 22:35:51,845] - ph5.clients.ph5tostationxml - ERROR: 006-6048-1 response_table_n_i 0: response_file_sensor_a is blank while sensor model exists.
[2021-02-22 22:35:51,845] - ph5.clients.ph5tostationxml - ERROR: Response_t n_i(s) duplicated: 0,1
<?xml version='1.0' encoding='UTF-8'?>
<FDSNStationXML xmlns:iris="http://www.iris.edu/xml/station/1/" xmlns="http://www.fdsn.org/xml/station/1" schemaVersion="1.1">
  <Source>PIC-PH5</Source>
  <Sender>IRIS-PASSCAL-DMC-PH5</Sender>
  <Module>PH5 WEB SERVICE: metadata | version: 1</Module>
  <ModuleURI></ModuleURI>
  <Created>2021-02-22 22:35:52.137374</Created>
  <Network code="2E" endDate="2019-01-15T06:21:45.000000Z" startDate="2019-01-06T02:25:05.000000Z" iris:PH5ReportNum="18-030">
    <Description>Thwaites Margins</Description>
    <TotalNumberStations>173</TotalNumberStations>
    <Station code="1010" endDate="2019-01-15T02:44:29.000000Z" startDate="2019-01-06T03:41:12.000000Z">
      <Latitude unit="DEGREES">-79.421768</Latitude>
      <Longitude unit="DEGREES">-111.968945</Longitude>
      <Elevation unit="METERS">1770.2</Elevation>
      <Site>
        <Name>Converted from UTM Zone 12S</Name>
      </Site>
      <CreationDate>2019-01-06T03:41:12.000000Z</CreationDate>
      <TerminationDate>2019-01-15T02:44:29.000000Z</TerminationDate>
      <TotalNumberChannels>9</TotalNumberChannels>
      <SelectedNumberChannels>3</SelectedNumberChannels>
      <Channel code="GP1" endDate="2019-01-15T02:44:29.000000Z" locationCode="" startDate="2019-01-06T03:41:12.000000Z" iris:PH5ReceiverId="1010" iris:PH5Component="1" iris:PH5Array="001">
        <Latitude unit="DEGREES">-79.421768</Latitude>
        <Longitude unit="DEGREES">-111.968945</Longitude>
        <Elevation unit="METERS">1770.2</Elevation>
        <Depth unit="METERS">0.0</Depth>
        <Azimuth unit="DEGREES">0.0</Azimuth>
        <Dip unit="DEGREES">0.0</Dip>
        <SampleRate>1000.0</SampleRate>
        <Sensor>

Stationxml should only be output when stationxml_on_error is set to true.

damhuonglan commented 3 years ago

@timronan The error messages go back to the old style again. Please run 'git log' and look for the last commit in this PR. I think your PH5 went back to the old version. Please look at this comment: https://github.com/PIC-IRIS/PH5/pull/445#issuecomment-778318308

timronan commented 3 years ago

git pull
Already up-to-date.
(ph5) -bash-4.2$ 
(ph5) -bash-4.2$ git log
commit 9b8daaf4bbfad55cb17bfead1014670e6aa01a76
Merge: 39cf326 8fd0fee
Author: Lan <ldam@passcal.nmt.edu>
Date:   Mon Feb 22 11:44:20 2021 -0700
    merge master

I checked my result again with the same answer.

ph5tostationxml -n master.ph5 -p ./
[2021-02-23 01:43:34,317] - ph5.clients.ph5tostationxml - ERROR: 004-4001-1 response_table_n_i 0: Response das file name should be 'Geode_1000_1_12' or 'Geode_GS11D_1000GPZ' instead of 'ZLAND3C_1000_1_12'.
[2021-02-23 01:43:34,318] - ph5.clients.ph5tostationxml - ERROR: 004-4001-1 response_table_n_i 0: response_file_sensor_a is blank while sensor model exists.

damhuonglan commented 3 years ago

You should try from the folder of the code, run python ph5tostationxml.py -n master -p path/to/ph5. Because if you look at test in test_ph5tostationxml: https://github.com/PIC-IRIS/PH5/blob/9b8daaf4bbfad55cb17bfead1014670e6aa01a76/ph5/clients/tests/test_ph5tostationxml.py#L319-L326

What showing in the messages you show doesn't match with the test that passed. You have something wrong in git setting and the ph5 that is running.

You can also check by activate conda environment ph5, then run pip list and check if the folder listed next to ph5 point to the git of ph5 that you just updated by running git pull.

timronan commented 3 years ago

The version number changes, which is good, but I needed to run pip install -e . to use the new version. It is working now and I am beginning to test. Sorry for the confusion.

damhuonglan commented 3 years ago

@timronan I have updated CHANGELOG. However, there is one of your test seem to not be completed yet. That is test_Response_NI_DUPLICATION() in test_ph5tostationxml.py at line 597. I had some comments on that about testing for n_i duplication should be on read_network() instead of read_stations(). Please read that part and complete the test. Thank you.

timronan commented 3 years ago

@damhuonglan the test test_read_networks_response_duplication is the coverage that I was trying to create in test_Response_NI_DUPLICATION(). I removed test_Response_NI_DUPLICATION() so we don't double our testing coverage. Do you think we are ready to merge?

damhuonglan commented 3 years ago

@timronan Yes. I have nothing against that. You can merge that if you feel good about that.

PIC-IRIS / PH5

Fix 443 dev #445

What does this PR do?

What have been modified?

Checklist