OceanParcels / virtualship

Authentic tool and teaching material about sea-based research
https://virtualship.oceanparcels.org/
MIT License
5 stars 1 forks source link

CLI: Get data #68

Open VeckoTheGecko opened 1 month ago

VeckoTheGecko commented 1 month ago

virtualship fetch: Downloads the data needed for the simulation as an explicit step so that users can do it separately. If virtualship is ran without this, then it will go and fetch this data.

VeckoTheGecko commented 1 month ago

Also see if we can use one environment for everything:

Data can be downloaded with the `download_data.py` script. For now a different conda env is needed for downloading, see comments in the script.
iuryt commented 2 weeks ago

Is someone working on this? If not, I can try to help. Should we add new variables to schedule.yaml, such as bbox for the area_of_interest? The rest would be to basically adapt scripts/download_data.py to here https://github.com/OceanParcels/virtualship/blob/6f843e969f51cbe6cce0aa374889d3a9b0e49693/src/virtualship/cli/commands.py#L48-L50

Let me know your thoughts.

VeckoTheGecko commented 2 weeks ago

I'm happy to work on this as well

Should we add new variables to schedule.yaml, such as bbox for the area_of_interest?

I think that would be good, but perhaps instead of in the config maybe we can have it as arguments to the virtualship fetch --bbox_min=lat,lon --bbox-max=lat,lon? And then if they don't provide them (i.e., just virtualship fetch it can suggest a bounding box for the user based on their schedule + some buffer). I think we can't purely go on waypoints in the schedule since I assume the students would be changing on the fly throughout a class, and we wouldn't want them having to redownload the data mid-exercise.

@ammedd, in terms of data fetching is the download_data.py script all that is needed? Was there any other data fetching as part of virtualship?

EDIT: Hmm, also the question of time domain then. Perhaps it would just be easiest to add spatial and temporal domain to the schedule config file

iuryt commented 2 weeks ago

Yes, I lean more towards adding this to the config file. For example, if we have an experiment that relies only on drifters, the study area and time range will probably be considerably larger than the schedule.

If you haven't started working on this, I can initiate a PR and we can collaborate from there. What do you think?

VeckoTheGecko commented 2 weeks ago

@iuryt Sounds good! Happy for you to do a PR and I can jump in in review

ammedd commented 2 weeks ago

The download_data is all I used before. It was based on an area of interest based on the sampling stations, but indeed, it would be nice to extend this a bit. And in case of Argo/drifter deployment It downloaded a separate dataset that had an area 3degrees more on each side to allow for 3 additional weeks of data from the deployed instruments.