Clay-foundation / model

The Clay Foundation Model (in development)
https://clay-foundation.github.io/model/
Apache License 2.0
299 stars 38 forks source link

Packaging structure #77

Closed yellowcap closed 5 months ago

yellowcap commented 9 months ago

I would like to add dependencies, docs, and tests to the pipeline and the landcover based sampling. But that will require an updated structure for the code.

So far we have three functionalities that could be sub-packages:

  1. Model
  2. MGRS Sampling
  3. Data ingestion pipeline
  4. (upcoming) Benchmark

How should we organize these?

  1. Each in a subfolder in src?
  2. Or we put all together into one package and add all dependecies from all packages to the conda env?
  3. ...

If its not too much overhead for the model package, we could put all under one.

weiji14 commented 9 months ago

As of commit c6a8365403fe0eb2297bc758a57b113d447ba0f0, we've more or less put the data processing scripts (1. MGRS Sampling, 2. Data ingestion pipeline) under scripts/, and the neural network model stuff (0. Model) under src/. I'd say documentation can go in a top-level docs/ folder. Not sure about the tests, since there are some under src/tests/ for the model, but it should be possible to move them up a level to just tests/.

The conda environment can almost be a separate issue. So far, we've maintained a single conda environment.yml, but are you suggesting that we have separate environments for the data processing (which might not need heavy libs like Pytorch) and the model (which has all the Pytorch/CUDA libs)?

yellowcap commented 9 months ago

I was thinking of grouping things by topic, so that there is clear naming and folder strucutre. So also the model should be under a different name, stating that it's the model.

What about:

src/model
src/pipeline
src/sampling
src/benchmark

and yes then put all tests under test/...

For the environment, agreed it could be a separate issue. But for now lets leave it all in one and separate when necessary.

weiji14 commented 9 months ago

Ok, that folder structure looks good to me (but I'll also let others chime in). If we're putting all the source code (i.e. *.py files) under src/, we can keep the unit tests under src/tests/ instead of moving it to the top-level. The documentation should still be at the top-level under docs/ for visibility.

yellowcap commented 5 months ago

This is a large issue and got stale. Lets' keep improving on packaging in small increments.