Open timeu opened 7 years ago
The problem is that we don't have a single command to add a new study, how do we usually add them? I will fuse the instructions in compute_n_hits, import_publication_links and import_sample_number in one command and remove generate_complete_csv.
So I see it as follows: We should have a import_phenotypes command that we can run by hand or as a cronjob that will go to AraPheno fetch the data, insert new phenotypes. I wouldn't want to update the existing ones, because otherwise we need to re-index all the associations. Usually also the data on AraPheno doesn't get updated once they are published. This will make sure that we allways have the published AraPheno phenotypes also in AraGWAS. Eventually we should also have a cronjob that would run the GWAS pipeline for the new phenotypes (or if a new genotype is released for all the existing ones). But right now we will probably do this by hand. So as you pointed out we probably need an endpoint that would take an hdf5 file and create a GWAS study that is connected to the phenotype and index the associations.
Ok, I will delete the other commands and create a new one for new studies (as proposed in #31 ). However we base all the current pipeline on the fact that studies, phenotypes and hdf5 files always carry the same id, can we keep this assumption for the future? (i.e. will the file be named 289.hdf5?)
No we can't. This is purely a coincidance because we currently have a 1-1 mapping between phenotypes and GWAS studies (1 transformation, 1 method and 1 genotype). As soon as we introduce either a new method or a new genotype version this does not uphold. I would design the command that it takes the phenotype id, genotype id, method, transformation and a HDF5 file and creates a new GWAS study (id should be automatically assigned).
Currently there are a lot of mangement commands:
Some of them were workarounds to get the data in. We should remove those. So far I think submit_to_datacite, setup_es, index_study, import_phenotypes are definitely required. Not sure about the others.
The import_phenotypes should have an option to update the phenotype information if they already exists.