Dataimport issues - Githubissues

nmaltsev commented 2 years ago

I have analyzed the issue raised by Margaux. And I found out that dataimport scripts cause exceptions during datasets loading. These exceptions do not prevent the script from executing. And only understanding that some dataset files are missing in the target directory proves that the whole step failed

I have collected some exceptions that I found in her model that may be of interest to us:

Temperature parameter:

RuntimeError: python -m motuclient --motu https://my.cmems-du.eu/motu-web/Motu --service-id BALTICSEA_REANALYSIS_PHY_003_011-TDS --product-id dataset-reanalysis-nemo-dailymeans --longitude-min 9.0138 --longitude-max 30.2358 --latitude-min 48.4917 --latitude-max 65.8914 --date-min "2021-01-01 00:00:00" --date-max "2021-02-01 00:00:00" --variable thetao --out-dir /media/share/data/20d95f1a836dacf600ec35983a5cc194/_dataset/Temperature/ --out-name TemperatureBalticmodelNetCDF2021-01-01to2021-02-01.nc --user mjaouen --pwd Azerty123456 --depth-min 0 --depth-max 20, 
2022-09-01 14:58:31.430 [ERROR] 010-6 : The date range is invalid. Invalid date range: [2021-01-01 00:00:00,2021-02-01 00:00:00]. Valid range is: [1993-01-01 12:00:00,2020-12-31 12:00:00].

Ammonium:

RuntimeError: python -m motuclient --motu https://nrt.cmems-du.eu/motu-web/Motu --service-id BALTICSEA_ANALYSISFORECAST_BGC_003_007-TDS --product-id dataset-bal-analysis-forecast-bio-dailymeans --longitude-min 9.0416 --longitude-max 30.2087 --latitude-min 53.0083 --latitude-max 65.892 --date-min "2021-01-01 00:00:00" --date-max "2021-02-01 00:00:00" --variable nh4 --out-dir /media/share/data/be6a6fc1acdd615625ea11a572f7a77d/_dataset/Ammonium/ --out-name AmmoniumBalticmodelNetCDF2021-01-01to2021-02-01.nc --user mjaouen --pwd Azerty123456 --depth-min 0 --depth-max 20, 
2022-09-01 15:19:49.863 [ERROR] 010-20 : The limit of file size is exceeded. Please narrow your request.

eastward_Water_current:

RuntimeError: python -m motuclient --motu https://nrt.cmems-du.eu/motu-web/Motu --service-id BALTICSEA_ANALYSISFORECAST_PHY_003_006-TDS --product-id dataset-bal-analysis-forecast-phy-dailymeans --longitude-min 9.0416 --longitude-max 30.2087 --latitude-min 53.0083 --latitude-max 65.891 --date-min "2021-01-01 00:00:00" --date-max "2021-02-01 00:00:00" --variable uo --out-dir /media/share/data/23c8a1c70b46164eeff2995d34afae2e/_dataset/eastward_Water_current/ --out-name eastward_Water_currentBalticmodelNetCDF2021-01-01to2021-02-01.nc --user mjaouen --pwd Azerty123456 --depth-min 0 --depth-max 20,
2022-09-01 15:32:23.696 [ERROR] 010-20 : The limit of file size is exceeded. Please narrow your request.

As you can see, I've added the command used to load the dataset to make it easier to reproduce.

ghost commented 2 years ago

For the "The limit of file size is exceeded" issue we can try to download data with a depth between 0 and 4 meters, instead of downloading data between 0 and 20 meters. In fact, I don't think we need data deeper than 2.8 meters. Another solution to this will be to download the daily data week by week instead of month by month, but I don't know if the data pretreatments are adapted to have data with one file per week.

For the "The date range is invalid" issue, we can't avoid to have this error, because the valid range changes over time. But we have to make sure that if the download fail in the archiving dataset, it will be downlaoded in the current dataset and vice versa. For example here it failed for the temperature in BALTICSEA_REANALYSIS_PHY_003_011-TDS the so the data have to be dowloaded in BALTICSEA_ANALYSISFORECAST_PHY_003_006-TDS dataset. Here the issue is that the user chosen a dataset that did not correspond to the year he choose. I see that in the web interface we can choose a single dataset, but for some years if we want to have all the data we have to download it from two different datasets.

qjutard commented 2 years ago

The concatenation part of the pretreatment can work no matter how much time is stored in each file. The only requirement is that the alphabetical order respects the chronoligical order.

For the second part I think that all we will do is make sure that the time range that is shown in dataCmd is accurate, then we can expect the user to choose the right ones. Maybe we could change the A&F datasets to say "present - ~2 years" and "present" for example ? I don't think we should implement anything regarding the use of two datasets.

ghost commented 2 years ago

I changed the code to download the data two weeks by two weeks, when it's daily data. I think to make sure to don't have issues with the data size, we can bound the depth by 10 meters (as a maximum value), I don't think that we will have algae longer than 7 meters. It's a good idea to change the informations given to the users in the interface, to help them to choose the dataset. It will be hard to keep the time range accurate in dataCmd.

ARGANS / shellfish_and_algae-MODEL

Dataimport issues #7