StanfordBioinformatics / pulsar_lims

A LIMS for ENCODE submitting labs.
3 stars 1 forks source link

ENCODE data submission: a bunch of ChIPs #811

Closed twang15 closed 2 years ago

twang15 commented 2 years ago

https://docs.google.com/spreadsheets/d/1Hj5Al2J_F0sUUolG-dUQZqpEFPEl-YUm/edit#gid=795058038

HI Tao,

Could you also check this spreadsheet. There are ChIP that can be submitted. I think we talked about those when we met last time.

Thanks, Annika

twang15 commented 2 years ago

Hi Annika,

Sorry, I cannot remember anything about this submission. But it has been tracked in github and I will start the submission:

https://github.com/StanfordBioinformatics/pulsar_lims/issues/811

Best, Tao

twang15 commented 2 years ago

cs-106 (https://pulsar-encode.herokuapp.com/chipseq_experiments/106) replaced by cs-568 https://pulsar-encode.herokuapp.com/chipseq_experiments/568

twang15 commented 2 years ago

Hi Ingrid and Annika,

This ChIP experiment https://www.encodeproject.org/experiments/ENCSR236JZN/ was released on August 3rd, 2020. However, I was able to add 2 more replicates (rep 3 and rep 4) but failed to add the fastq files.

The questions are:

  1. why did I succeed submitting more replicates after a ChIP being released?
  2. why the submission of fastq files failed while the replicate submission was successful.

Annika, in this submission sheet https://docs.google.com/spreadsheets/d/1Hj5Al2J_F0sUUolG-dUQZqpEFPEl-YUm/edit#gid=795058038, you asked the question in row 5 for rep3/rep4 “why was this experiment repeated? It's released with minor audit?”, do you figure out the answer or do we need to submit rep 3 and rep4?

Best, Tao

twang15 commented 2 years ago

Missing biosample characterization:

https://www.encodeproject.org/experiments/ENCSR506SSQ/

twang15 commented 2 years ago

Missing biosample characterization:

https://www.encodeproject.org/experiments/ENCSR506SSQ/

IP posted for both biosamples

twang15 commented 2 years ago

Hi Annika,

We have new sequencing data from SREQ-430 for cs-152 (ENCSR774PVT). But its upstream experiment has been removed. Do we still want to submit the new data?

Thanks, Tao

twang15 commented 2 years ago

Hi Ingrid and Jennifer,

Is the ENCODE server under maintenance? I have been trying to submit a ChIP experiment: https://www.encodeproject.org/experiments/ENCSR548FLW/, but constantly see this error:

2022-02-28 09:53:54,440:ppy_debug: GET Biosample record with ID 13848: https://pulsar-encode.herokuapp.com/api/biosamples/13848 ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Thanks, Tao

twang15 commented 2 years ago

Hi Tao,

I haven't been able to find a problem on our end, we aren't seeing issues with submission from other users or on the team. Have you given the submission another try this afternoon?

Alternatively, is there possibly an issue with a mismatch of the submission metadata and the object type, or some other clash? I see that there's a released biosample has an alias matching that Pulsar record ID in your error: https://www.encodeproject.org/biosamples/ENCBS447IFV/

Best, Ingrid

twang15 commented 2 years ago

Hi Ingrid,

Here is the latest error message. The portal detects there is a conflict of repetitive file submission, but then it says the file cannot be found.

2022-03-01 12:43:28,942:eu_debug: <<<<<< POST file record michael-snyder:SREQ-430-GCCAAT_S3_L001_R1_001.fastq.gz To DCC with URL https://www.encodeproject.org/file and this payload:

{ "aliases": [ "michael-snyder:SREQ-430-GCCAAT_S3_L001_R1_001.fastq.gz" ], "award": "/awards/UM1HG009442/", "controlled_by": [ "ENCFF931PCB" ], "dataset": "ENCSR548FLW", "file_format": "fastq", "file_size": 2803379992, "flowcell_details": [ { "barcode": "GCCAAT", "lane": "" } ], "lab": "michael-snyder", "md5sum": "884b33332f913ffa08d374e8715c4e95", "output_type": "reads", "paired_end": "1", "platform": "encode:NovaSeq6000", "read_length": 100, "replicate": "1fd30eca-0e2d-4e85-a7eb-a87727b35acb", "run_type": "paired-ended", "submitted_file_name": "/oak/stanford/scg/prj_ENCODE/SREQ/430/SREQ-430-GCCAAT_S3_L001_R1_001.fastq.gz" }

2022-03-01 12:43:29,475:eu_debug: {'@type': ['HTTPConflict', 'Error'], 'status': 'error', 'code': 409, 'title': 'Conflict', 'description': 'There was a conflict when trying to complete your request.', 'detail': "Keys conflict: [('alias', 'md5:884b33332f913ffa08d374e8715c4e95')]"} 2022-03-01 12:43:29,476:eu_debug: >>>>>>GET michael-snyder:SREQ-430-GCCAAT_S3_L001_R1_001.fastq.gz From DCC with URL https://www.encodeproject.org/michael-snyder:SREQ-430-GCCAAT_S3_L001_R1_001.fastq.gz/?format=json&datastore=database 2022-03-01 12:43:29,624:eu_debug: NOT FOUND

twang15 commented 2 years ago

Hi Tao,

It looks like the clash is via the md5sum (so it sees that identical file contents are being uploaded, and blocks that action), but the GET to try to see that file is using an alias that doesn't exist (the file on the portal has a very slightly different order of elements in its alias).

File on the portal with md5:884b33332f913ffa08d374e8715c4e95 has alias michael-snyder:SREQ-430-3-GCCAAT_S3_L001_R1_001.fastq.gz. The file you're trying to upload has the matching md5 but different alias, michael-snyder:SREQ-430-GCCAAT_S3_L001_R1_001.fastq.gz, which doesn't exist and that's why it can't be found.

Best, Ingrid

twang15 commented 2 years ago

Hi Ingrid,

Thanks for identifying the root cause. Now I am confused, how do files of SREQ-430 already get submitted since SREQ-430 and SREQ-431 was swapped? We identified this mistake after winter break, and the conflicting file is from then SREQ-430 (before winter break, which should actually be SREQ-431).

twang15 commented 2 years ago

Hi Annika,

After long-time careful sifting through the records, I have finally finished the most submission of ChIPs in this sheet: https://docs.google.com/spreadsheets/d/1Hj5Al2J_F0sUUolG-dUQZqpEFPEl-YUm/edit#gid=795058038

Right now, we are waiting for Ingrid’s notification on the completion of the fastq files. Some are to be moved from revoked (marked red) experiments, others are due to the swapping of SREQ-430 and SREQ-431 caused by Sequencing Center.

After that, IPs, PCRs and possible_controlls need to be manually submitted. It is a long way, but we are close to see the twilight.

Best, Tao

twang15 commented 2 years ago

Hi Ingrid,

Please proceed to move rep 3 for the following two experiments:

from https://www.encodeproject.org/ENCSR101DNY to https://www.encodeproject.org/experiments/ENCSR279OXE/

from https://www.encodeproject.org/ENCSR727GPE to https://www.encodeproject.org/ENCSR724QJT

Thanks, Tao

twang15 commented 2 years ago

cs-538 is completed https://www.encodeproject.org/experiments/ENCSR279OXE/

twang15 commented 2 years ago

cs-537 is completed https://www.encodeproject.org/experiments/ENCSR724QJT/

twang15 commented 2 years ago

All ChIPs in this bunch have been submitted successfully.