Open twang15 opened 3 years ago
Hi Annika,
We can have a discussion tomorrow morning after meeting with DCC.
Here is the github issue for this submission: https://github.com/StanfordBioinformatics/pulsar_lims/issues/481
Best, Tao
Download Reference file to scg
Hi Annika,
Could you help me prepare the submission sheet for green leaf lab? In tab “Reference_Tao”, please fill the last column (marked red).
https://docs.google.com/spreadsheets/d/1kxFkyQg19nLxj6kdhBS_pN8dzLp9xrNh/edit#gid=1329654463
Thanks, Tao
Reference file uploaded
Hi Annika,
https://docs.google.com/spreadsheets/d/1kxFkyQg19nLxj6kdhBS_pN8dzLp9xrNh/edit#gid=1292157095
For FunctionalCharacterizationExperiment, assay_term_name should be one in the following list:
"CRISPR screen",
"MPRA",
"perturbation followed by scRNA-seq",
"perturbation followed by snATAC-seq",
"pooled clone sequencing",
"STARR-seq"
Which one is the right one?
Thanks, Tao
functional_characterization_experiment is the right profile for Functional characterization experiment Many others are available.
perturbation followed by snATAC-seq
Thanks, Annika
Hi SCG administrators,
I (taowang9) need read permission on /oak/stanford/groups/wjg/jgranja/GEO_SpearATAC .
Could you please help me out?
Best, Tao
Hi Annika,
https://docs.google.com/spreadsheets/d/1kxFkyQg19nLxj6kdhBS_pN8dzLp9xrNh/edit#gid=1812279104
I remember that only a subset of all the files need to go to the portal. Could you please let me know the details?
Thanks, Tao
Hi SCG administrators,
I (taowang9) need read permission on /oak/stanford/groups/wjg/jgranja/GEO_SpearATAC .
Could you please help me out?
Best, Tao
Tao,
That directory has open permissions. I've added execute bit for you on /oak/stanford/groups/wjg/jgranja so that you can get to it. Try it now.
Taymoor Arif
Hi Tao, I marked everything red that should be uploaded in your file tab. It’s all the files with file_format “fastq” or “bam”.
Let me know if you have questions, Annika
Hi Annika,
https://docs.google.com/spreadsheets/d/1kxFkyQg19nLxj6kdhBS_pN8dzLp9xrNh/edit#gid=1467081546
Platform is required for file submission. But the platform “Illumina NextSeq 550” does not exist on the portal.
Could you please help register it?
Thanks, Tao
Hi Annika,
The platform is now registered at: https://www.encodeproject.org/platforms/NTR:0000655/ It also has the alias encode:NextSeq550.
Best, Jennifer
Hi Jennifer,
Thanks for the registration! However, I cannot access it at this time. Please grant me access.
Error message: Not available
Your account is not allowed to view this page.
Best, Tao
Hi Sarah,
Could you answer the following questions for us?
Since we are going to model the processing, can we get some details expanding upon “To see an example analysis for these fastq files see https://github.com/GreenleafLab/SpearATAC_MS_2021/tree/main/AlignSgRNA. You can find example fastqs and the pipeline used to process these fastqs into a table corresponding to each sgRNA.” ? We need essentially to know tools, versions used to process the data.
Thanks, Tao
Hi Annika and Tao,
I was reviewing the SPEAR data and came up with the following FASTQs that failed validation:
I see that some of the errors are due to suspected duplication between files. For example the file ENCFF306PQP will-greenleaf:scATAC_GM_LargeScreen_Rep4_R1_001.fastq.gz Seems to be “conflicting” with ENCFF661VYE and ENCFF925HWL, ENCFF606KQM, ENCFF178OBX, … What I find confusing is that some of the files belong to other experiments(for example ENCFF306PQP is from ENCSR820VEF (will-greenleaf:sp_experiment8) but ENCFF661VYE is from ENCSR138NWM (will-greenleaf:sp_experiment7)) Perhaps all it is a demultiplexing issue, but I want to check before I let these file through.
Thanks,
Idan
Hi Idan,
I double-checked the original submission sheet, and did not find any problem on my side, unless that original experiment grouping was wrong.
@Amy, could you help elaborate this issue?
Thanks, Tao
Found an email form August with the answer in regard to sgRNAs. Tao, please feel free to contact me if the answer below is not making sense.
Idan
Hi Sarah,
I’ve prepared the submission sheet for the “pooled clone sequencing” FunctionalCharacterizationExperiment sgRNA.
Could you please provide more information about the libraries “Library-sgRNA” in this spread sheet? https://docs.google.com/spreadsheets/d/1kxFkyQg19nLxj6kdhBS_pN8dzLp9xrNh/edit#gid=1322748522
Thanks, Tao
Hi Tao,
The following 6 "elements reference" files are flagged with a "content error": https://www.encodeproject.org/report/?type=File&submitted_by.%40id=%2Fusers%2F9e077d38-a99b-4f84-8c79-6c75cf505731%2F&field=%40id&field=output_type&field=aliases&field=content_error_detail&sort=output_type&lab.title=Will+Greenleaf%2C+Stanford&status=content+error&status=uploading&output_type=elements+reference
This is because the portal expects .fasta files to be gzipped (.fasta.gz). Would you be able to gzip the 6 files and then reupload?
Please let me know when you are ready to do so, and I can reset the status of the files for that.
Separately, the DCC was able to resolve the validation errors on the index reads files after adding read structure metadata. For the reads with potential duplication errors, we are awaiting further info.
Thanks! Jennifer
Hi Jennifer,
These files are ready for resubmission. Please reset the status for me.
Best, Tao
Hi Tao,
Thanks! I have changed the status to uploading, please feel free to resubmit the files now.
Jennifer
Hi Jennifer,
I tried but failed with the following error. Could you please check the credential status again?
2021-11-02 14:22:20,383:eu_debug: Attempting to generate new file upload credentials 2021-11-02 14:22:20,534:eu_debug: Error 403: unable to re-issue upload credentials for 'ENCFF617HLE' 2021-11-02 14:22:20,537:eu_debug: { "@type": [ "HTTPForbidden", "Error" ], "code": 403, "description": "Access was denied to this resource.", "detail": "status must be \"uploading\" to issue new credentials", "status": "error", "title": "Forbidden" }
Thanks, Tao
Hi Jennifer,
I tried but failed with the following error. Could you please check the credential status again?
2021-11-02 14:22:20,383:eu_debug: Attempting to generate new file upload credentials 2021-11-02 14:22:20,534:eu_debug: Error 403: unable to re-issue upload credentials for 'ENCFF617HLE' 2021-11-02 14:22:20,537:eu_debug: { "@type": [ "HTTPForbidden", "Error" ], "code": 403, "description": "Access was denied to this resource.", "detail": "status must be "uploading" to issue new credentials", "status": "error", "title": "Forbidden" }
Thanks, Tao
This resubmission is done.
Hi Annika and Idan,
Here is what has been finished since our last meeting, and I want to update the current status of this submission:
Best, Tao
Sure, Annika. I think I will be able to move forward once Sarah finishes the library-sgRNA sheet. There may be other issues popping up later, but I will let everyone know if any.
Best, Tao
From Sarah:
Okay sorry I need a little more clarification, so I am only filling out the "Library-sgRNA" sheet in that Excel file? And should there be a row for every sgRNA sample that was submitted (so there should be 19 total)?
For the construction platform, it was a homemade protocol rather than a kit -- how should I indicate that?
And then how do I link the information between the "Library-sgRNA" sheet and the "Files-sgRNA sheet" so that you know which library coordinates with which Files?
Hi Sarah and Idan,
Idan, could you comment on the construction platform field?
Sarah, for the other two questions,
my understanding is that for each biosample, there is one library. But I am not sure whether a sgRNA library shared the same biosample as a spear-ATAC. Do they share the biosample in your experiment? If so, we may use the same registered biosample ENC on the portal.
For the link between a library and files, it is established through another sheet: “Replicate-sgRNA” (https://docs.google.com/spreadsheets/d/1kxFkyQg19nLxj6kdhBS_pN8dzLp9xrNh/edit#gid=1253730928), you can refer to sheet “Replicate-Tao” as a starting point.
Please note the submission is multiple step process, we will need revisit the other sheets once the experiments and libraries have been submitted.
Best, Tao
Hi Sarah, Will,
The problem with the files from your experiments (and this is why they are flagged) is that the headers/seq identifies in the files are not unique, also between files that belong to different experiments. The DCC validator picked this up for several files in your experiments. This happens sometimes when samples were multiplexed and unique sample information is not retained when demultiplexed. Someone from the Greenleaf team has to check that the files were properly demultiplexed and that the information given to me/Tao reflects the correct file names.
Please let is know if there are questions,
Annika
Hi Sarah and Will,
There is another problem that I noticed. DCC requires the index read of spearATAC experiments to have “read_structure” information.
You may need to schedule a meeting with Ingrid and Jennifer to discuss the details for this field and let me know the result. This information depends on how the experiment was done and we cannot solve it without your inputs.
Best, Tao
Tao,
{"sequence_element": "barcode", "start": 1, "end": 16}
Thanks, Annika
From Sarah:
Hi Annika,
I don't think I understand what you mean by "the headers/seq identifies in the files are not unique, also between files that belong to different experiments." Do you mean the names of the files themselves? From what I see in "FunctionalCharacterizationExperiment_Tao" which lists all of the files, they all appear to have unique names. We know that the files were demultiplexed properly because we went forward and analyzed the data from there -- it would have been quite obvious if any of the reads were from the wrong experiment (diff cell lines, diff sgRNAs, etc.).
Hi Sarah,
If library construction was done using “homemade” protocol you should send the PDF to Tao or me, we would register that on the portal and then we can associate it with the libraries in question.
In regard to your question about non-uniqueness:
If the DCC looks on two FASTQ files with read names like: FILE1 @INST1:234:LXH5YIU:3:12:235:11:34 1::0:N
FILE2 @INST1:2122334:LXH5YIU:3:11:231:12:3 1::0:N
We will flag these FILE1 and FILE2 as suspects for duplication, that is done because the read names suggest the flowcell and the lane of the reads in these two files were the same LXH5YIU:3. We understand that this type of “potential” duplication may result from a valid demultiplexed files, but do not have a simple or efficient way to validate correctness - so rely on the labs taking a look and making sure there was no erroneous submission of duplicates of FASTQs that were not supposed to have this apparent similarity in the read names.
I hope that explains Annika’s question.
Idan
https://docs.google.com/document/d/18HWBgcL8nrYF90-JcYRWcwAOJOuQenXJm-6X_4HeK3s/edit#
Submit the SpearATAC processed data in one .tar file
From William Greenleaf:
Are we all set for these data uploads then? W
I think we are waiting for Sarah’s response?
Idan
Hi Idan,
I attached the protocol for making the sgRNA libraries.
Also yes, the files were demultiplexed appropriately.
Thanks!
Best, Sarah
Thank you Sarah, we will upload the protocol and will patch the FASTQs that apparently have no duplication.
Tao, I am not sure where you stand in regard to the creation of pooled-clone-sequencing experiments?
Thanks,
Idan
Hi Idan and everyone,
Thanks for pushing this forward.
I am working on it now, and will keep everyone posted.
Best, Tao
Hi Idan,
I have one quick question:
We decided to combine scATAC_K562_Pilot_Rep1.fragments.tsv.gz and scATAC_K562_Pilot_Rep1.singlecell.csv into one tar ball. Could you please let me know what “output_type” should we use for the tarball submission?
Thanks, Tao
The last groups of paired files for scATAC was tarred into one file and have been submitted.
Hi Idan,
The sgRNA experiment submission fails the validation. But I can only see the following client-side error message.
Could you please take a look and share your thoughts?
Thanks, Tao
2021-11-22 10:23:36,781:eu_debug: <<<<<< POST functional_characterization_experiment record will-greenleaf:Spear_ATAC-K562-Pilot_sgRNA To DCC with URL https://www.encodeproject.org/functional_characterization_experiment and this payload:
{ "aliases": [ "will-greenleaf:Spear_ATAC-K562-Pilot_sgRNA" ], "assay_term_name": "pooled clone sequencing", "award": "/awards/UM1HG009436/", "biosample_ontology": "/biosample-types/cell_line_EFO_0002067/", "elements_mappings": [ "ENCSR858ICE" ], "elements_references": [ "ENCSR867SMQ" ], "lab": "/labs/will-greenleaf/", "plasmids_library_type": "gRNA cloning" }
2021-11-22 10:23:36,928:eu_debug: Failed to POST will-greenleaf:Spear_ATAC-K562-Pilot_sgRNA 2021-11-22 10:23:36,929:eu_debug: <<<<<< DCC POST RESPONSE: 2021-11-22 10:23:36,932:eu_debug: { "@type": [ "ValidationFailure", "Error" ], "code": 422, "description": "Failed validation", "errors": [ { "description": "{'elements_mappings': ['ENCSR858ICE'], 'elements_references': ['ENCSR867SMQ'], 'lab': '/labs/will-greenleaf/', 'award': '/awards/UM1HG009436/', 'biosample_ontology': '/biosample-types/cell_line_EFO_0002067/', 'assay_term_name': 'pooled clone sequencing', 'aliases': ['will-greenleaf:Spear_ATAC-K562-Pilot_sgRNA'], 'plasmids_library_type': 'gRNA cloning'} is not valid under any of the given schemas", "location": "body", "name": [] } ], "status": "error", "title": "Unprocessable Entity" }
Hi Tao,
Please add
"control_type": "control",
and try again
Idan
Hi Idan,
It works. The next question is about the construction_platform for the sgRNA libraries. Do you know which platform we should use?
Thanks, Tao
If you have 10X kit/platform identifier we can register a new Platform object on the portal to link to. If not, we can leave it not specified.
Idan
Hi Idan,
This is the 10X identifier that I used in Snyder Lab’s submission. https://www.encodeproject.org/platforms/NTR:0000452/
Can we use the same one for the current spearAtac submission?
Thanks, Tao
When I read Sarah’s sgRNA library generation protocol, no 10X specific details were specified, so I am not sure. It is a question for Sarah (or someone that knows exactly how the experiment was performed).
Idan
From Sarah:
No 10x specific protocols were used to generate the sgRNA libraries after the scATAC libraries were made. Please note that the sgRNA libraries are derivatives of the scATAC libraries -- e.g. the scATAC library is the starting material to create the sgRNA library, and therefore the sgRNA library is just a subset of the scATAC library. I hope that clarifies things.
Thanks Sarah,
Tao please submit these libraries without platform info.
Idan
Hi Tao,
We have a submission for Will Greenleaf’s lab that I worked on for a while now. It’s a bit more complicated and I was wondering if we could have a call this week and we go through the spreadsheet together and I explain you everything? I can’t meet on Thursday during our usual time because I will be at the German embassy on Thursday. I could talk tomorrow after our DCC meeting or Wednesday morning. Let me know what works best for you.
Here’s a link to the spreadsheet already. I will explain you the tweaks when we meet. https://docs.google.com/spreadsheets/d/1kxFkyQg19nLxj6kdhBS_pN8dzLp9xrNh/edit#gid=1812279104
Thanks! Annika