StanfordBioinformatics / pulsar_lims

A LIMS for ENCODE submitting labs.
3 stars 1 forks source link

ENCODE data submission: Heart sample submission #953

Open twang15 opened 2 years ago

twang15 commented 2 years ago

https://docs.google.com/spreadsheets/d/1Q3H4wSm2KThEJZJHdrWA815SbmZ5sg2B/edit#gid=784802929

Hi Tao,

Could you start with the submission of the last heart samples (all the tabs except “files" and "multiomic series")?

Thank you, Annika

twang15 commented 2 years ago

Hi Annika,

I have fixed several mistakes in the Donor submission sheet. Headers marked red are fixed by me. A missing column in Donor tab is also required by ENCODE: life_stage.

Thanks, Tao

twang15 commented 2 years ago

Hi Annika,

I have finished the submission except for files and multiomic series. Could you continue filling replicate and experiment in files?

Thanks, Tao

twang15 commented 2 years ago

Hi Annika,

Thanks for your confirmation. I fixed most of them by adding a new column G.

One exception in the original column H filled by you: Cell H49 and cell H52 is the same. Is this a mistake?

Best, Tao

twang15 commented 2 years ago

Hi Annika,

The paired_end and paired_with columns are having problems: paired_end should be one of 1 or 2 or 1,2 paired_with should be filled for rows w/ paired_end equal to 2 Could you update the spread sheet?

Thanks, Tao

twang15 commented 2 years ago

2022-06-06 17:46:14,478:eu_debug: <<<<<< POST file record michael-snyder:UW228_R3_L2_ATAC_fastq To DCC with URL https://www.encodeproject.org/file and this payload:

{ "aliases": [ "michael-snyder:UW228_R3_L2_ATAC_fastq" ], "award": "/awards/UM1HG009442/", "dataset": "ENCSR450VTB", "file_format": "fastq", "file_size": 0, "lab": "/labs/michael-snyder/", "md5sum": "d41d8cd98f00b204e9800998ecf8427e", "output_type": "reads", "paired_end": "2", "paired_with": "michael-snyder:UW228_R1_L2_ATAC_fastq", "platform": "/platforms/OBI:0002630/", "read_length": 50, "replicate": "7cff62ff-13cd-48fb-bd2c-285ddd62a762", "run_type": "paired-ended", "submitted_file_name": "/oak/stanford/scg/prj_ENCODE/Staging2/220321_A00509_0475_AH7NW7DRX2-linlab1_031522_scATAC/linlab1_031522_scATAC-ATGGTCGC-CGACATAG-GATTCGCT-TCCAGATA/linlab1_031522_scATAC-ATGGTCGC-CGACATAG-GATTCGCT-TCCAGATA_S6_L002_R3_001.fastq.gz" }

2022-06-06 17:46:14,998:eu_debug: {'@type': ['HTTPConflict', 'Error'], 'status': 'error', 'code': 409, 'title': 'Conflict', 'description': 'There was a conflict when trying to complete your request.', 'detail': "Keys conflict: [('alias', 'md5:d41d8cd98f00b204e9800998ecf8427e')]"} 2022-06-06 17:46:14,999:eu_debug: >>>>>>GET michael-snyder:UW228_R3_L2_ATAC_fastq From DCC with URL https://www.encodeproject.org/michael-snyder:UW228_R3_L2_ATAC_fastq/?format=json&datastore=database 2022-06-06 17:46:15,182:eu_debug: NOT FOUND

twang15 commented 2 years ago

Hi Annika,

There seems to be a serious problem:

If you search ‘files’ by key d41d8cd98f00b204e9800998ecf8427e, you can see that there are 4 files with this key: Row 22: /oak/stanford/scg/prj_ENCODE/Staging2/220321_A00509_0475_AH7NW7DRX2-linlab1_031522_scATAC/linlab1_031522_scATAC-AGACTTTC-CCGAGGCA-GATGCAGT-TTCTACAG/linlab1_031522_scATAC-AGACTTTC-CCGAGGCA-GATGCAGT-TTCTACAG_S8_L002_I1_001.fastq.gz

Row 117: /oak/stanford/scg/prj_ENCODE/Staging2/220321_A00509_0475_AH7NW7DRX2-linlab1_031522_scATAC/linlab1_031522_scATAC-ATGGTCGC-CGACATAG-GATTCGCT-TCCAGATA/linlab1_031522_scATAC-ATGGTCGC-CGACATAG-GATTCGCT-TCCAGATA_S6_L002_R3_001.fastq.gz

Row 305: /oak/stanford/scg/prj_ENCODE/Staging2/220321_A00509_0475_AH7NW7DRX2-linlab1_031522_scATAC/linlab1_031522_scATAC-AGGGATGA-CTTCTGTT-GAATGCAC-TCCACACG/linlab1_031522_scATAC-AGGGATGA-CTTCTGTT-GAATGCAC-TCCACACG_S2_L001_R1_001.fastq.gz

Row 322: /oak/stanford/scg/prj_ENCODE/Staging2/220321_A00509_0475_AH7NW7DRX2-linlab1_031522_scATAC/linlab1_031522_scATAC-ACGTTCAC-CAAGGTCT-GTTAAGTG-TGCCCAGA/linlab1_031522_scATAC-ACGTTCAC-CAAGGTCT-GTTAAGTG-TGCCCAGA_S3_L002_I1_001.fastq.gz

These four files have different names but their contents are the same. How could this be possible?

Best, Tao

twang15 commented 2 years ago

Yes. It means that some sequencing results were mixed up. Part of the datasets has already been submitted, which may need to be revoked later. Before we figure out what really happened, we should hold back for submission.

We need to contact John for a thorough investigation. Do you have the sequencing request numbers for all files?