breedfides / airflow-etl

0 stars 2 forks source link

Implement radiation data fetching DAG (DWD CDC netCDF) #5

Closed gannebamm closed 6 months ago

gannebamm commented 9 months ago

Status Quo

Currently, only some POC DAGs are defined. There will be a meeting with bioinformatic researchers to define which environmental datasets are helpful for their analyses.

Expected behaviour

There is a ready-to-use DAG for the first environmental dataset. The first DAG shall extract data from DWD CDC:

radiation

The datasets are stored at the CDC FTP and in netCDF format. The netCDFs are zipped. The radiation data from 2000 onwards shall be downloaded, the zip extracted and the netCDF clipped to the given lat / long position + 3km radius.

gannebamm commented 9 months ago

We still need information until we can start to further define this issue.

gannebamm commented 9 months ago

@brightemahpixida the first dataset was defined in today's meeting. See above.

There are more CDC netCDF datasets to follow. Those will be part of different issues. To give you some heads-up for the soon following tasks:

PRIO 2:

gannebamm commented 9 months ago

@brightemahpixida if you need further information regarding the datasets, you can ping @vineetasharma105. She has experience with netCDF files.

brightemahpixida commented 9 months ago

Thanks @gannebamm - i'm currently having a look at this now, i'll make sure to ping Vineeta if i have a question 👍

brightemahpixida commented 9 months ago

Hi @gannebamm, i got a couple of questions on this:

brightemahpixida commented 9 months ago

Hi @vineetasharma105 - I also need a bit of clarification on the netCDF file

vineetasharma105 commented 9 months ago

Hi Bright, (If I understood your question correctly), the variable value is fetched for a certain 'time' and certain 'Lat-Long' combination. And the Latitude and Longitude is always to be used together (that makes a spatial point) and this will be used to clip based on the specified radius. Please let me know if I was not able to understand your question again.

gannebamm commented 9 months ago

I assume this new DAG we'll be creating will be included with the WCS and WFS DAGS (which we created on the first PR) or are we starting afresh since the title of this issue states that this is our first DAG?

You are right. I will change this issues title. The WCS and WFS DAGs are the first DAGs provided. This one is the second.

brightemahpixida commented 9 months ago

Thanks @vineetasharma105 - I think i now understand it to a degree, but maybe i need a little bit clarification on the 'clip' concept; would you be free or ok with a short call later on today (15:00); i.e. if you are free today, if not we could schedule the call for tomorrow

Or how does that sound to you : )

vineetasharma105 commented 9 months ago

@brightemahpixida : Hi Bright. Sorry for the late response. Yes we can have a call today. Let me know how you want to have it.

brightemahpixida commented 9 months ago

Hi @vineetasharma105, thanks for the reply - will 14:00 or 15:00 work for you today, i don't think it's going to be a long session. I can setup a webex link later on once i get your reply

vineetasharma105 commented 9 months ago

@brightemahpixida : Yes, this time slot is good.

brightemahpixida commented 9 months ago

@vineetasharma105 Great thanks, i just sent a link now to your mailbox

brightemahpixida commented 9 months ago

Hi @vineetasharma105 @gannebamm - I have the radiation DAG ready, please can you review the linked PR and see if this implementation fits the use-case we discussed over the past couple of weeks.

Also will it alright if we saved the output as parquet files, or do you have something in mind on what the clipped output will look like

I'm still running more extensive tests on this feature - but so far the latitude/longitude inputs I've used for the testing hasn't been returning any clipped outputs, will it be ok if you could share some inputs on your end that you feel will definitely produce some results for the radiation geo-data

To give you a run-through again of the clipping workflow, it goes like this:

brightemahpixida commented 9 months ago

I just added a new commit to the PR :)