eQTL-Catalogue / eQTL-SumStats

eQTL Catalogue Summary Statistics
3 stars 1 forks source link

eQTL Catalogue data release #70

Closed ljwh2 closed 2 months ago

ljwh2 commented 4 months ago

✅ Run the tsv2hdf Nextflow pipeline -- ✅ Generate HDF5 files -- ✅ Generate metadata file ✅ Copy HDF5 files ✅ Copy metadata file ✅ Copy the .tsv files to ftp ✅ Test release files (HDF5 & tsv files) ✅ Gitlab release ✅ Update confluence doc https://www.ebi.ac.uk/seqdb/confluence/display/GOCI/eQTL+Summary+Statistics ✅ Review by Kaur

karatugo commented 4 months ago

HDF5 files are missing in the private FTP. Asked Kaur if we should run the pipeline on EBI compute cluster.

karatugo commented 4 months ago

Asked Kaur for clarification on the Nextflow pipeline.

karatugo commented 4 months ago

Gathered *.cc.tsv.gz files from sumstats and unzipped them.

karatugo commented 4 months ago

Submitted tsv2hdf pipeline as a SLURM job

         JOBID PARTITION NAME                                                    USER     ST TIME       NODES  NODELIST
       7943535 datamover tsv2hdf-eqtl-sumstats-70                                gwas_lsf R  2:49       1      codon-dm-07

Update: Failed due to time and memory constraints as they are not specified. This is due to SLURM migration.

karatugo commented 3 months ago

Submitted a SLURM job that created HDF5 files. Expect the files in 2 days in this dir: /hps/nobackup/parkinso/spot/gwas/scratch/eqtl-sumstats-70/hdf5

         JOBID PARTITION NAME                                                    USER     ST TIME       NODES  NODELIST
       8342181 datamover tsv2hdf-eqtl-sumstats-70                                gwas_lsf R  19:02      1      codon-dm-05

Update. Failed as Wallclock exceeded.

karatugo commented 3 months ago

I have resubmitted the SLURM job with expanded time and memory limits (7 days, 32GB). Over two days, it generated 19 files using 4GB each. Estimating roughly two weeks to produce 195 files with the same constraints. I will request an extension next Monday if necessary.

     JOBID PARTITION NAME                                                    USER     ST TIME       NODES  NODELIST
      12602601 datamover tsv2hdf-eqtl-sumstats-70-take_2                         gwas_lsf R  0:07       1      codon-dm-07
karatugo commented 3 months ago

Update - 14/05. Still running, 17 files are generated so far.

[gwas_lsf@codon-dm-08 goci1306]$ ls -ltr /hps/nobackup/parkinso/spot/gwas/scratch/eqtl-sumstats-70/hdf5/data/
total 11237028
-rw-r--r-- 1 gwas_lsf spot  641870177 May 13 11:48 QTD000564.cc.h5
-rw-r--r-- 1 gwas_lsf spot 1171725085 May 13 14:46 QTD000565.cc.h5
-rw-r--r-- 1 gwas_lsf spot  384931120 May 13 15:05 QTD000566.cc.h5
-rw-r--r-- 1 gwas_lsf spot  651121794 May 13 15:55 QTD000567.cc.h5
-rw-r--r-- 1 gwas_lsf spot  404137316 May 13 16:15 QTD000568.cc.h5
-rw-r--r-- 1 gwas_lsf spot  249738809 May 13 16:24 QTD000569.cc.h5
-rw-r--r-- 1 gwas_lsf spot  635153165 May 13 17:22 QTD000570.cc.h5
-rw-r--r-- 1 gwas_lsf spot  165125845 May 13 17:27 QTD000571.cc.h5
-rw-r--r-- 1 gwas_lsf spot  321946272 May 13 17:42 QTD000572.cc.h5
-rw-r--r-- 1 gwas_lsf spot  237479017 May 13 17:50 QTD000573.cc.h5
-rw-r--r-- 1 gwas_lsf spot  929367552 May 13 19:18 QTD000574.cc.h5
-rw-r--r-- 1 gwas_lsf spot 1229204319 May 13 22:17 QTD000575.cc.h5
-rw-r--r-- 1 gwas_lsf spot  361636592 May 13 22:37 QTD000576.cc.h5
-rw-r--r-- 1 gwas_lsf spot  541988245 May 13 23:17 QTD000577.cc.h5
-rw-r--r-- 1 gwas_lsf spot  284813726 May 13 23:27 QTD000578.cc.h5
-rw-r--r-- 1 gwas_lsf spot 2345812458 May 14 09:31 QTD000579.cc.h5
-rw-r--r-- 1 gwas_lsf spot  950666626 May 14 11:41 QTD000580.cc.h5
karatugo commented 3 months ago

132 files done. Asked Codon Team to add another 7 days to the job.

karatugo commented 3 months ago

All files done.

karatugo commented 3 months ago

Submitted SLURM job 15938725 that generates metadata using https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/blob/master/data_tables/dataset_metadata_upcoming.tsv

karatugo commented 3 months ago

Metadata generation complete.

karatugo commented 3 months ago

Copy data and metadata (qtl_metadata.h5)  to the private ftp /otar_sumstats complete.

karatugo commented 3 months ago

Copy data and metadata (qtl_metadata.h5) to the pub ftp /eqtl complete.

karatugo commented 3 months ago

Submitted batch job 15954227 to copy the .tsv files in sumstats and susie to the pub ftp /nfs/ftp/public/databases/spot/eQTL/

karatugo commented 3 months ago

Copy .tsv files complete.

karatugo commented 3 months ago

Find the updated docs here: https://www.ebi.ac.uk/seqdb/confluence/display/GOCI/eQTL+Summary+Statistics

karatugo commented 3 months ago
karatugo commented 2 months ago

Kaur checked the files and announced the release on the eQTL Catalogue website as well: https://www.ebi.ac.uk/eqtl/Release_notes/