amazon-science / earth-forecasting-transformer

Official implementation of Earthformer
Apache License 2.0
359 stars 61 forks source link

How to run my own datasetwithin Earthformer #62

Open fizzking opened 10 months ago

fizzking commented 10 months ago

I want to use Earthformer to train my own dataset and test it, what format should I process the data into and what py files should I prepare?

gaozhihan commented 9 months ago

Thanks for your question. You may want to refer to the simplest test case to verify if the shapes are aligned correctly. Please note that this test script is from my fork, which has not been merged into this repo.

fizzking commented 9 months ago

I ran the command in readme: python3 -m pytest. The result is the following error, what does it mean? 5a32edff5a9fb4f0ec51f9a6b88ca1c

gaozhihan commented 9 months ago

I'm not sure if you are using the correct script in my fork, but you don't need to run pytest. Please simply try python ROOT_DIR/tests/test_cuboid.py.

fizzking commented 9 months ago

I run the test code according to what you said, and the result shows that the model lacks parameters. How to specify these two model parameters? a478cc3f888e896be4de7279e781c81

gaozhihan commented 9 months ago

You should parse the args to CuboidTransformerModel like

https://github.com/gaozhihan/earth-forecasting-transformer/blob/a5c07f22ec53ba577d679e0a3be8eb7e77d3e82c/tests/test_cuboid.py#L24-L29

fizzking commented 9 months ago

Thank you very much for your patient reply! I successfully ran this code.

fizzking commented 9 months ago

The test_cuboid.py you provided is to test the data. Do I need to write a training code according to your train_cuboid_nbody

gaozhihan commented 9 months ago

Yes, please feel free to refer to [train_cuboid_nbody.py]](https://github.com/amazon-science/earth-forecasting-transformer/blob/7732b03bdb366110563516c3502315deab4c2026/scripts/cuboid_transformer/nbody/train_cuboid_nbody.py) and train_cuboid_sevir.py for implementing your own training script. The main task is to implement your own LightningDataModule to replace the original one

https://github.com/amazon-science/earth-forecasting-transformer/blob/7732b03bdb366110563516c3502315deab4c2026/scripts/cuboid_transformer/sevir/train_cuboid_sevir.py#L485-L506

fizzking commented 9 months ago

My data is a csv file with M rows and N columns, where the columns of the csv file are: time, latitude, longitude, several predictive factors and target outputs affected by the predictive factors. So each row represents different predictive factors and targets at different times and different longitude locations, but my latitude and longitude are not on a regular grid of points as in the ENSO example you provided, so there is no way to handle it as an array shape like ENSO (Time, lat, lon, number of predictive factor), isn't it necessary to process the data on a regular grid with regular latitude and longitude lat x lon in order to enter it into Earthformer?

gaozhihan commented 9 months ago

Earthformer is designed to handle regularly gridded data. For your case, you may want to use masks to indicate missing values, if the data is not too sparse.

fizzking commented 9 months ago

Are there any examples for reference?