TUW-GEO / ismn

Readers for the data from the International Soil Moisture Network
https://ismn.earth/en/
MIT License
39 stars 21 forks source link

Bug fix for __read_csv() method to now read RISMA network data of "header & files" format again #67

Closed nfb2021 closed 1 year ago

nfb2021 commented 1 year ago

General:

There was a Problem with reading data from the RISMA network. This data only contained soil moisture as variable and was of format "header & files". Further, this data was dowloaded from the ISMN webpage May 22, 23 at 09:00.

Problem:

To my understanding, all sensor files of the "header & files" format should contain four columns: date/time, data, new flag, old flag. For some RISMA sensor files the last column, old flag, contained strings such as:

The method __read_csv() of the DataFile class could not parse these sensor files and threw following error: pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 23, saw 9. This phenomenon only occurred with some RISMA sensor files, other networks were fine.

Fix:

Note:

wpreimes commented 1 year ago

I see that the pipeline for windows tests is not working in master right now. I will fix this. Afterwards you update your branch with the changes in master and then the tests should run here as well.

wpreimes commented 1 year ago

Regarding test data, it's fine to add a new file to check the new changes. Generally we should keep the number of test files low, so maybe if we don't need all of the newly added files, you can delete some of them?

wpreimes commented 1 year ago

and please add one sentence on the fixed bug to CHANGELOG.rst (under unreleased changes)

coveralls commented 1 year ago

Coverage Status

Coverage: 88.329% (+0.03%) from 88.296% when pulling c5362248ec1c27f5044ad1e3872924367d65982e on nfb2021:master into 08ce41112340b258acf942829b1979be20234edd on TUW-GEO:master.

wpreimes commented 1 year ago

ok, looks good. Tests are passing now. Do you still want to change anything here?

nfb2021 commented 1 year ago

ok, looks good. Tests are passing now. Do you still want to change anything here?

I implemented all changes you requested, so as of now I am happy with how it is. Once I have the CEOP data I will try it and if something needs to be done, I will have a look at it

wpreimes commented 1 year ago

I just noticed two more things :) 1) the folder .vscode is still there, I assume because you committed before updating the .gitignore file. There is a command to delete an (ignore) folder from the git history. Can you check that? I want to avoid having that in the upstream repo 2) The folder RISMA_test_data should either be deleted if it's not needed anymore or integrated into the actual test data if you want to include reading the data as part of the available tests. Both are fine for me. If you want to include the data, please copy the network folder to e.g. test/test_data/Data_seperate_files_header_20170810_20180809, make sure in the tests, that the RISMA data is correctly read, and if necessary update the tests.

nfb2021 commented 1 year ago

I just noticed two more things :)

1. the folder .vscode is still there, I assume because you committed before updating the .gitignore file. There is a command to delete an (ignore) folder from the git history. Can you check that? I want to avoid having that in the upstream repo

It should be removed now from git

2. The folder RISMA_test_data should either be deleted if it's not needed anymore or integrated into the actual test data if you want to include reading the data as part of the available tests. Both are fine for me. If you want to include the data, please copy the network folder to  e.g. test/test_data/Data_seperate_files_header_20170810_20180809, make sure in the tests, that the RISMA data is correctly read, and if necessary update the tests.

I just deleted the directory. With everything going on with the pull request that is already quite some input. Adapting the tests is something for the next time

wpreimes commented 1 year ago

great, thanks.