enram / data-repository

Data quality assessment
https://enram.github.io/data-repository/
MIT License
3 stars 1 forks source link

Upload flyway data batch 1 + verify coverage of vp data #40

Closed peterdesmet closed 6 years ago

peterdesmet commented 7 years ago

I have looked at the file coverage of the vp data I have on the USB stick and added that information to the "IssueListing_WesternFlyway" Google Spreadsheet (tab "Upload to repository").

For example: if there are 4 scans per hour, then 4*24 = 96 vp files are expected per day:

screen shot 2017-09-20 at 15 35 49

While we have all files for most days, almost every radar has days with missing files. A few radars even have too many files.

@CeciliaNilsson709 @plieper, can you assess per radar if that is a known issue we can ignore or if this is something to look into? I've added a column coverage_remark to add information. Once a radar contains "OK" in that column, I'll upload it to the data repository. πŸ‘Œ

adokter commented 7 years ago

OPERA requires countries to send data at least once every 15 minutes, but it is allowed (and better) if countries send data more frequently. So the 144 are countries sending data every 10 minutes I think

CeciliaNilsson709 commented 7 years ago

Cool, thanks Peter!

Yes, all (?) the data in Baltrad is per 15 min, but the 144:s here are the polish data (which we did not get through Baltrad) and they are every 10 min.

I'll take a look at the other issues soon!

peterdesmet commented 7 years ago

All vp files I got are now uploaded to the repository! πŸ‘ Example: http://enram.github.io/data-repository/?prefix=pl/leg/2016/

Before I did, I removed the midnight Oct 10 - which was there for some French radars - and a duplicate midnight Oct 9 for some Swedish radars. All remaining coverage issues are missing files: if I get those files, I can add them to the repository.

Once the coverage is verified, I'll create the monthly zip files (which we'll also use on Zenodo).

@stijnvanhoey, I noticed the coverage is calculated for those new uploads. I guess that was triggered by the daily transfer (and it looks at the whole repo)? Neat!

stijnvanhoey commented 7 years ago

Coverage screening on the daily transfer checks the whole repository, thanks ;-)

stijnvanhoey commented 7 years ago

However, we're lacking any data on Baltrad for the last days, so no new data...

CeciliaNilsson709 commented 7 years ago

I have looked through the coverage and in most cases where there is data missing it's no problem (from the perspective of the flyway analysis). Either its very few files missing per day (so won't affect the overall means in the analysis), or there is data missing from daytime only (does not matter to us), or the radar is already excluded anyway.

There are however some cases we need to take a look at:

@peterdesmet I will have more data to add soon, I'll wetransfer it to you?

baptischmi commented 7 years ago

@CeciliaNilsson709 does the time windows with no data differ from exclusion time I wrote in the Excel table? can you please mail me the exact time windows i should check again? for frbla and frlep, but maybe also for frcae, frace, detur, and deeis.

peterdesmet commented 7 years ago

@CeciliaNilsson709 yep, you can WeTransfer it to me. πŸ‘

CeciliaNilsson709 commented 7 years ago

@baptischmi

frlep: The coverage table says that there is a large chunk of data missing on oct 09, but I can't see it in the issue listing or in the plots I have.

frbla: Data missing on sep 19 according to coverage table.

I guess if the data is missing in the very end and and the very beginning it would have been hard to see during the plotting? Could you replott/take a look at the data from those two so we can confirm what time window is missing? Would be good to know if its something we can ignore or if it needs to be taken into account.

For the others I don't think there is that much we can do? I am guessing that if the data is missing, you don't have it either. Its just good to know exactly what is missing to we can take it into account in the analysis.

baptischmi commented 7 years ago

@CeciliaNilsson709 frlep: last data plotted is on October 9 at about 16:00/17:00. No data afterwards. frbla: on my plots, data start on Sept 19 at about 7:00. I refer to the data Liesbeth in the zip-file named "output_v0.3.13.zip" (10/05/2017).

I may have not used the function "regularize" for processing vplists for French data (but did it for DWD data). i have discovered this function only later on i think. Aren't you usingVP-h5-files (prior time "regularisation") for the vp-processing? Why is the function "regularise" important for uploading data to the data repository or vp-processing?

CeciliaNilsson709 commented 7 years ago

@baptischmi

Ok, thanks, thats good to know! Makes sense that it was in the very beginning and end. The regularize() only affects the plotting, making it easier to see missing data. So it doesn't matter for the processing or uploading really.

adokter commented 7 years ago

@CeciliaNilsson709 @baptischmi Actually regularize also affects the data: it projects the data on a regular time grid that you can specify yourself. Not all radars send data on the same time interval, so to line them up you can use this function.

Data gaps won't be visible on irregular timeseries, as the plotting function will take the last profile to represent the entire data gap (it makes no prior assumptions on the expected data interval)

CeciliaNilsson709 commented 7 years ago

@adokter @baptischmi

Ok, sorry for that. But its only applied after the processing right? So in this case doing it differently would only affect the plotting/screening, not the .h5 files that are being uploaded (or used in the analysis, where its all treated the same)?

adokter commented 7 years ago

@CeciliaNilsson709 Yes it's only post-processing in bioRad, has nothing to do with h5 files

peterdesmet commented 6 years ago

So, the first batch of flyway data I received has been successfully uploaded:

frbou
frmcl
frmom
seang
sevar
nldbl
nldhl
silis
sipas
czbrd
czska
fianj
fiika
fikes
fikor
fikuo
filuo
fipet
fiuta
fivan
fivim
frabb
frale
frave
frbla
frbol
frbor
frcae
frche
frcol
frgre
frlep
frmtc
frnan
frnim
frniz
fropo
frpla
frtou
frtra
frtre
frtro
searl
sease
sehud
sekir
sekkr
selek
selul
seosu
seovi
sevil
deess
defbg
defld
dehnr
deisn
demem
depro
detur
deumd
deboo
dedrs
deeis
deneu
denhb
deoft
deros
plbrz
plgda
plleg
plpoz
plram
plswi

The expected date range for these is 2016-09-19/2016-10-09, except for the 16 Ferman radars (dexxx) for which it is 2016-09-09/2016-10-09. I'll create a new issue for the second batch (which e.g. includes the pt radars).

@CeciliaNilsson709 if the result of the verification you're doing is that we might have to update/change some files, let me know. I'll also leave it to you to close this issue, maybe you want it open as a reminder to verify those radars that still have issues. ☺️

CeciliaNilsson709 commented 6 years ago

I don't think there is really that much we can do about it, we have the data that we have, but it's really good to have the overview. Since it's all being tracked in the spread sheet, you can close here.