GEUS-Glaciology-and-Climate / pypromice

Process AWS data from L0 (raw logger) through Lx (end user)
https://pypromice.readthedocs.io
GNU General Public License v2.0
14 stars 4 forks source link

cloud cover difference between edition 3 and edition 4 #94

Closed fsn1995 closed 1 year ago

fsn1995 commented 1 year ago

Hi, earlier, I downloaded the PROMICE data to compare the hourly cloud cover. The figure below is plotted using the old edition 3 I downloaded last year. cloudcoverBox

Yesterday I saw there's a new version 4 and it includes new AWSs. So I tried to make a new plot with the latest edition, but the result looks different. untitled

I tried to exclude the newly added AWSs and keep the temporal coverage the same as edition 3. However, the result is pretty much the same. I wonder if there's any major change in determining the cloud cover fraction? Thank you so much in advance for your help!

For reference, the old edition 3 was downloaded using the data_url.csv file from (https://dataverse.geus.dk/api/access/datafile/:persistentId?persistentId=doi:10.22008/FK2/IPOHT5/ALZ64J). The link is no longer valid as it's now replaced by a new version.

PennyHow commented 1 year ago

We have tried our best to keep the cloud cover calculation consistent between Edition 3 and Edition 4. However, we have migrated our workflow from IDL/GDL to Python and I wonder if there is a discrepancy.

Which station/s are you plotting these from?

The data_url.csv you are linking to is from a former pypromice Dataverse file. Please see the updated data_url.csv on the pypromice Dataverse entry, or see the file on the pypromice repo for the most up-to-date file. These are now updated with Edition 4 download urls. Edition 3 data can still be found here

fsn1995 commented 1 year ago

Hi @PennyHow , thank you so much for your clarification. Yes, the first figure was plotted using the old edition 3 data. The link was no longer valid, but I have a local copy myself (attached here) data_urls.csv Then I downloaded the data from the latest csv file in this github repo, the same as you shared here. The second figure was plotted using the data downloaded from the latest links this morning. I also tried to exclude the new AWSs in edition 4, but the result is still quite different from the previous one. Hope this helps! Thanks again for your help! I am just curious about the difference.

fsn1995 commented 1 year ago

Hi. Earlier, I downloaded the PROMICE data to compare the hourly cloud cover. The figure below is plotted using the old edition 3 I downloaded last year. cloudcoverBox

Yesterday I saw there's a new version 4, and it includes new AWSs. So I tried to make a new plot with the latest edition, but the result looked different. untitled

I tried to exclude the newly added AWSs and keep the temporal coverage the same as edition 3. However, the result is pretty much the same. I wonder if there's any major change in determining the cloud cover fraction? Thank you so much in advance for your help!

For reference, the old edition 3 was downloaded using the data_url.csv file from (https://dataverse.geus.dk/api/access/datafile/:persistentId?persistentId=doi:10.22008/FK2/IPOHT5/ALZ64J). The link is no longer valid as it's now replaced by a new version.

Sorry, I noticed that I put the wrong figure earlier. Now I have corrected it. Both plots are showing the cloud cover fraction in June-August for all the available data. I dropped lines without valid "time", "cc", "gps_lat", "gps_lon". I also tried to keep the temporal consistency with V3 and didn't add the new stations not included in the previous edition.

PennyHow commented 1 year ago

Could you tell me which stations you have been using in your plots? Are these plots from cloud cover at all stations?

I have been looking at differences in cc and the variables used to derive it (air temperature t_u, downwelling longwave radiation dlr) at individual stations (mainly NUK_K). Most discrepancies appear to be because variables are rounded in Edition 3 before cc is derived, but we do not rounded values until the end of the processing workflow in Edition 4.

This does not explain why there are such big discrepancies between Edition 3 and Edition 4 though. It may be a particular station that is causing this, which I will keep exploring.

fsn1995 commented 1 year ago

Thanks a lot for your help! Yes, here's the csv file I used for the edition 3 data. data_urls_edition3.csv Next file is the edition 4 list. I manually deleted some AWSs that didn't exist in edition 3. data_urls_edition4 - Copy.csv

In the postprocessing, I dropped lines without valid data (i.e., "time", "cc", "gps_lat", "gps_lon" in edition 4 and "Year", "MonthOfYear", "DayOfMonth", "HourOfDayUTC", "CloudCover", "LatitudeGPSdegN", "LongitudeGPSdegW" in edition 3). and limited the time to June-August. I didn't look at specific stations before, but I can also take a look at my side later. I hope this is helpful!

fsn1995 commented 1 year ago

Thanks a lot for your help! Yes, here's the csv file I used for the edition 3 data. data_urls_edition3.csv Next file is the edition 4 list. I manually deleted some AWSs that didn't exist in edition 3. data_urls_edition4 - Copy.csv

In the postprocessing, I dropped lines without valid data (i.e., "time", "cc", "gps_lat", "gps_lon" in edition 4 and "Year", "MonthOfYear", "DayOfMonth", "HourOfDayUTC", "CloudCover", "LatitudeGPSdegN", "LongitudeGPSdegW" in edition 3). and limited the time to June-August. I didn't look at specific stations before, but I can also take a look at my side later. I hope this is helpful!

@PennyHow Hi, I checked the data and found something unusual. Take NUK_K for example, the link in the data_urls_edition3.csv (the old file I downloaded last year) is the same as Dataverse. The link to the hourly NUK_K edition 3 data is https://dataverse.geus.dk/api/access/datafile/:persistentId?persistentId=doi:10.22008/FK2/8SS7EW/E4SZZT.

However, in the updated data_url.csv, the link to the hourly NUK_K edition 4 data is https://dataverse.geus.dk/api/access/datafile/:persistentId?persistentId=doi:10.22008/FK2/IW73UU/DJF1H2. If we go to Dataverse, the link is different https://dataverse.geus.dk/api/access/datafile/:persistentId?persistentId=doi:10.22008/FK2/IW73UU/9P2O8Y.

I downloaded the data for NUK_K edition 4 data again, using both links, and compared them with the edition 3 data. The result is quite different.

Linear regression with edition 3 data using edition 4 data from data_url.csv v4url. Linear regression with edition 3 data using edition 4 data from Dataverse. The mean difference (edition 4 - edition 3) is 0 (std=0.0142), which is minor. v4dataverse

So I would almost guess it's the data_url file that needs to be updated. (I have not yet checked all the other AWSs, though).

PennyHow commented 1 year ago

Yes, I think you have cracked it. It looks like the urls in our data_urls.csv are wrong. Let me fix this and then you can try your cloud cover plots again. I'll update them now and let you know when the correct urls are there.

PennyHow commented 1 year ago

The most up-to-date URLs are now available here. I'm just waiting for someone to review the PR request for this to be merged with our main branch.

Can you let me know how your plots look now using these new URLs?

fsn1995 commented 1 year ago

v4Updated @PennyHow Hi, thank you so much for your help. Here is the is the updated cc plot and looks good. I also did a quick comparison of cc and albedo at 2 random AWSs with edition 3 data and the agreement is well. I think we can close this issue now. Thanks again for your help!

PennyHow commented 1 year ago

You are so welcome. Thanks for highlighting this to us!