StanfordBioinformatics / pulsar_lims

A LIMS for ENCODE submitting labs.
3 stars 1 forks source link

ENCODE data submission: scAS File validation error #888

Closed twang15 closed 2 years ago

twang15 commented 2 years ago

Hi Ingrid, Jennifer and Tao,

There are 4 scATAC experiments on the portal that are part of a multiome series that have file validations errors. Could you please let me know what happened here and how we can solve this?

Thanks, Annika

File validation error

ENCSR668XPI ENCSR435CGY ENCSR169BCG ENCSR628POB

twang15 commented 2 years ago

I also found this experiment has a similar issue: https://www.encodeproject.org/experiments/ENCSR924ZXU/

twang15 commented 2 years ago

SREQ-314: (auto submission) https://www.encodeproject.org/experiments/ENCSR668XPI https://pulsar-encode.herokuapp.com/atacseqs/316 https://www.encodeproject.org/experiments/ENCSR435CGY https://pulsar-encode.herokuapp.com/atacseqs/315 https://www.encodeproject.org/experiments/ENCSR169BCG https://pulsar-encode.herokuapp.com/atacseqs/299

SREQ-308: (manual submission) https://www.encodeproject.org/experiments/ENCSR628POB https://pulsar-encode.herokuapp.com/atacseqs/298

twang15 commented 2 years ago

It looks like that the files with validation error are different from others. In their names, they all have 'trim', e.g., /workdir/encode_scatac_dcc_2/results/ENCSR628POB-1/fastqs/R1_trim.fastq.gz

twang15 commented 2 years ago

Hi Annika,

The problematic files are not submitted by us (looks like Anshul’s lab) You can see they have very different file names, e.g.,

/workdir/encode_scatac_dcc_2/results/ENCSR668XPI-1/fastqs/R1_trim.fastq.gz

-Tao

twang15 commented 2 years ago

Hi Annika and Tao,

Tao is correct, these files are part of the processing submitted by Anshul, and the flag is due to no error on your part. These filtered FASTQs have some unexpected structure to their read naming that we will be patching the read_name_details property to account for, once all the snATAC processed data files are submitted to the portal.

Thanks for flagging this up, and let me know if you have any other questions/concerns about the snATAC datasets!

Best, Ingrid