OceanBioME / OceanBioMEArtifacts

Data files for OceanBioME.jl examples and tests
0 stars 0 forks source link

Issue downloading data #2

Closed johnryantaylor closed 1 year ago

johnryantaylor commented 2 years ago

I'm having a problem when I clone the example data repository. The .md file looks fine, but the .nc files are very small and just have something like the text below inside them: version https://git-lfs.github.com/spec/v1 oid sha256:af763cd97748a66a1b9c00bb87d961cb6dc045c8e0cfe6763313dc61e021b7cd size 158483

Do either of you know what might be causing this?

johnryantaylor commented 2 years ago

Ok, I figured this one out. I had to run git lfs clone to download the files. Just "git clone" just downloaded these small text files with the info above. I didn't find this very intuitive and I think that other users might struggle with it. I think that what we should do is change the examples so that they don't use large data files (e.g. replace the mixed layer depth and PAR with idealized analytical functions like the ones that @syou83syou83 is creating). We could have a separate example that showed how to read *.nc files and do the interpolation, but add instructions in the example file for how the users can download the data from the source instead of providing the files ourselves. I'll leave this comment open for now.

jagoosw commented 2 years ago

I also found this quite unintuitive when I tried to actually download it myself, and along with the other issue agree that this isn't the best way to have the examples.

syou83syou83 commented 2 years ago

I can make a similar subtropical.jl example, which will have much shallower mixed layer depth and thus small data files. And file subpolar_BGC.nc is actually not necessary since it's only used for comparison, between the results of our model and Mercator's. I suggest remove that from rep.

jagoosw commented 2 years ago

The real issue with the lfs thing is that the physics file is huge because its much higher resolution, without that we wouldn't need to use LFS

syou83syou83 commented 2 years ago

I forked sichen-org/OceanBioME_example_data to my personal profile, where I deleted subpolar_BGC.nc, subpolar_chl.nc, subpolar_physics.nc files and left only subpolar folder where surface PAR are stored. But even in this case, I still need lfs to clone the data.

jagoosw commented 2 years ago

Yeah that's because I set the repo up to store all .nc files in lfs so as its currently configured they're all stored in lfs. How do you think it would be best setup?

syou83syou83 commented 2 years ago

I will make some examples:

  1. a lobster model without carbonate chemistry (so that we don't need T and S dataset) using idealized MLD and surface PAR;
  2. an example that shows how to read *.nc files and do the interpolation, using small sample of S or T data;
  3. a lobster model with carbonate chemistry and PAR as a field, using small sample of surface PAR, T, S dataset, and idealized MLD
  4. If helpful, using small sample of surface PAR, T, S, Chl and real MLD dataset to show how PAR can be a function.

Add instructions or readme to outline all the examples and show how to download the data. Does it sound good to you @johnryantaylor @jagoosw ?

jagoosw commented 2 years ago

Hi Si,

I've nearly finished putting together a much smaller data file of T, S, MLD, and PAR so you could use that for 3? Otherwise, this seems like a good idea.

Jago

jagoosw commented 2 years ago

Okay I've added a new file called subpolar.nc and removed all the others, and turned off lfs so it should be much more straight forward to use that now.

subpolar.nc has data from the same time period (2020) but from a single point from Mercator, and just the area average of the PAR. So its just 4x1D data series.

jagoosw commented 2 years ago

I also noticed that in the original data files there was missing data because Iceland is in the area it was from, which wasn't filtered out so probably artificially pulled the temperature lower (because the fill value is about -3 degrees), and some other unknown effects on the salinity/mld. I've not finished running a test with the new data file but it might produce very slightly different results.

johnryantaylor commented 2 years ago

Great, thanks for catching these things! On Aug 18, 2022, 3:00 AM -0700, Jago Strong-Wright @.***>, wrote:

I also noticed that in the original data files there was missing data because Iceland is in the area it was from, which wasn't filtered out so probably artificially pulled the temperature lower (because the fill value is about -3 degrees), and some other unknown effects on the salinity/mld. I've not finished running a test with the new data file but it might produce very slightly different results. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

johnryantaylor commented 2 years ago

@syou83syou83, the examples that you describe above sound great!