PIC-IRIS / PH5

Library of PH5 clients, apis, and utilities
Other
15 stars 9 forks source link

Archive contains malformed data; duplicate stations #403

Open damhuonglan opened 4 years ago

damhuonglan commented 4 years ago

Describe the bug There was a reported bug on pforma that duplicate some stations in array_t (#360). It has been fixed in PR #361. However, there are still experiments with stations duplicated.

The old ph5_validate didn't have any problem with this issue. But with the current ph5_validate the duplicated stations will cause error "list index out of range".

To Reproduce Experiment with duplicated stations can be found at /Desktop/ph51/completed/20-009_Playa2 or /ph51/completed/18-028_BASIN

Expected behavior ph5.utilities.ph5validate - ERROR: list index out of range

dsentinel commented 4 years ago

If I understand, this is not a bug in the software, but some old PH5's are malformed?

damhuonglan commented 4 years ago

@dsentinel The duplicated stations caused by a bug in pforma. The bug has been fixed but most of old PH5s were malformed by this bug. PR #404 will give a warning about this but don't let the bug affect checking other issues when running ph5validate. These malformed PH5s didn't harm anything before, they were still submitted to DMC with no problem. This deformation is only exposed with the changes in ph5validate lately.

timronan commented 4 years ago

@damhuonglan the malformed PH5s stored at IRIS DMC may, and likely do, have errors. We set up a backdoor in our loader software to load historic PH5 metadata regardless of its ability to pass the stationxml validator. We had to do this during the PH5 metadata migration because there are a significant portion of historical PH5 experiments that don't pass validation. Duplicated stations lead to 111 errors.

dsentinel commented 4 years ago

The PR's have fixed the issue, but not for the data in the archive. I'm leaving this open.