Test process ENA brokering- JAX_RNAseq4_ExEm_brain

anu-shiva commented 2 weeks ago

ENA brokering process for KOLF lines datasets where fastq.gz files need to be without ChrY . These are stored in morphic-bio-processing bucket under nested folders

Testing starts with JAX's second dataset: [JAX_RNAseq2_ExtraEmbryonic/] Metadata spreadsheet is at ProdDB folder - v7_JAX_RNAseq2_Prod.xlsx

Process:

Dipayan to write the script to sync fastq.gz files without ChrY to ENA webin-dev.

Location of files: morphic-bio-processing/morphic-jax/filtered/release110-gencode44/
The folders are named as: (_basename__${fastq_gz_file})_val carrying Read1 and Read2 fastq.gz e.g. *R1*.filtered.fastq.gz, *R2*.filtered.fastq.gz
Files should be renamed for clarity: .filtered --> .ChrYexcluded

renamed files from Step 1 to be brokered to ENA

dipayan1985 commented 2 weeks ago

Hi @anu-shiva /cc @gabsie In this path we will copy sequencing files from -> s3://morphic-bio-processing/morphic-jax/JAX_RNAseq2_ExtraEmbryonic/filtered/release110-gencode44/

There are 44 folders, each for 1 library prep. In contrast, there are 82 library prep biomaterials which should correspond to 164 files. Is this expected?

Please confirm. The code of brokering needs adaptation accordingly.

gabsie commented 2 weeks ago

Hey, @anu-shiva Please email Hong to check and fix his filtered files folder for JAX study2

anu-shiva commented 2 weeks ago

the new testing should proceed with fourth dataset from JAX: [JAX_RNAseq4_ExEm_brain/] The spreadsheet to refer to is v7_JAX_RNAseq4_Prod.xlsx at ProdDB folder

dipayan1985 commented 1 week ago

Done, studies 4, 5 and 6 are brokered to ENA.

ebi-ait / submissions-workstream

Test process ENA brokering- JAX_RNAseq4_ExEm_brain #30