StanfordBioinformatics / pulsar_lims

A LIMS for ENCODE submitting labs.
3 stars 1 forks source link

ENCODE data submission: spear Atac for Eyal in Greenleaf lab #538

Closed twang15 closed 2 years ago

twang15 commented 3 years ago

Hi Tao,

Could you please work with Eyal to fill out the missing pieces for his spear ATAC submission (should similar to Sarah’s stuff), see spreadsheet in attachment. This should be up on the portal before 9/1. There has to be more info on the perturbation (GM), reference data sets, where are the fastq files, processed data? If you can start filling the gaps with him, I can help with the things that are still missing then.

Thanks! Appreciate your help, Annika

twang15 commented 3 years ago

Hi Eyal,

Nice to you e-meet you! This is Tao, the ENCODE data manager for Synder lab, and we are going to work together on this submission.

I’ve shared with you two files:

  1. Submission sheet for Sarah’s dataset: https://docs.google.com/spreadsheets/d/1kxFkyQg19nLxj6kdhBS_pN8dzLp9xrNh/edit#gid=840633303
  2. Submission sheet for Eyal’s dataset: https://docs.google.com/spreadsheets/d/1ooKKROR_ph4Vzogt0kGxr0ko4VKpeCnu/edit#gid=2018822737

Our goal is to fill the 2nd submission sheet together, and 1st submission sheet could be used as an example whenever you are unsure what information should be filled in.

The first step is to submit reference experiment, and please do the following for me:

  1. fill the submission file “Reference_Eyal” as much as possible
  2. fill the submission file “Biosample_Eyal” as well.

After 1 and 2 are done, I will do my part and let you know the next step on your side.

Please let me know if you have questions.

Best, Tao

twang15 commented 3 years ago

Hi all, I've just finished talking with Tao about my data and wanted to point out that I will submit only the snATAC data and not the sgRNA data (the induction failed). If I correctly understand the spreadsheet, there is no reason to fill the GM1/2 for this data, right? or maybe because the cell line has dCas9 integrated I need to mention it?

Anyway, please tell me how to proceed, and sorry about the confusion. Have a nice weekend, Eyal

twang15 commented 3 years ago

Hi Annika,

I’ve had a meeting with Eya, but we still have questions. Could you answer the questions that Eyal raised in his previous email?

Thanks, Tao

twang15 commented 3 years ago

Hi Tao,

Thanks for figuring this out and working on the submission. I am not sure about the answer to the question, I’d think it’s not necessary to add anything, maybe a submitter comment so that the user knows about the Cas9 integration? And then just proceed with a normal scATAC submission? What do you think about it from the DCC perspective, Ingrid and Jennifer? Best, Annika

twang15 commented 3 years ago

Hello all,

Restating to make sure I understand: so if the sgRNA induction step failed, the sample and results for this data should be identical to a "standard" snATAC experiment (except that the cells happen to have a Cas9 integration)? If there's a chance the data could at all be affected by Cas9, it could be useful to still include a GM indicating that Cas9 is integrated into the cells of the biosample (but then you would ignore other aspects of CRISPR-related submission). The submitter comment could also mention that Cas9 is integrated into these cells, but not used (because guides aren't present).

Would the files produced by the snATAC still be similar to Sarah's spearATAC, so that Tao can follow the submission details being worked out with Idan? Or are the resulting files somewhat different?

Lastly- Eyal, sorry about the wait, I can update your account permissions to give you access to unreleased ENCODE data. Can you confirm that you are a postdoc with Will Greenleaf?

Thanks, Ingrid

twang15 commented 3 years ago

Hi Eyal,

Could you reply to Ingrid’s questions below as you’re the expert for your data?

Thanks, Annika

twang15 commented 3 years ago

Hi all, I'm out until next week and with limited signal. Regarding the data, I would submit it as snATAC data with GM that day the cell line have dCas9 in it.

Yes, I'm a postdoc in the Greenleaf lab Eyal

twang15 commented 3 years ago

Hi all, I'm back and following up on the thread to see if I can help move the submission process forward.

Do I have the required access to the portal so I can start generating the GM for the cell line I used? The files generated are regular snATAC fastq files (as stated in the submission sheet).

Let me know if I can/need to do something to facilitate this upload, Eyal

twang15 commented 3 years ago

Hi Eyal and Tao,

Eyal, you now have permissions on the ENCODE portal and should be able to see unreleased data in progress. I have also made a matching account for you on our test portal, which is our 'sandbox' for trying out first-time submissions: https://test.encodedcc.org/

I'm not certain how much of the actual submission process you are taking on vs providing Tao with the details for. Tao, please let me know if you're handling all the ENCODE-side of the submission process using Pulsar, or if I should take some time to go over onboarding to portal submissions.

Thanks, Ingrid

twang15 commented 3 years ago

Hi Eyal,

Here are the first two steps on your side:

  1. Please go ahead to create the GM and Biosamples on the portal.
  2. Could you assign IDs (numbers are OK) to each of your snATAC experiments and then mark which biosample and fastq files are belonging to which experiments?

If you have questions, please let us know.

Thanks, Tao

twang15 commented 3 years ago

HI Eyal,

Could you give me an update on the progress for this submission, please?

Thanks, Annika

twang15 commented 3 years ago

Hi Annika, I was planning to do it today but it was pushed back for tomorrow, I'll update when I'm done with the two points Tao wrote.

Best, Eyal

twang15 commented 3 years ago

Hi Tao, I filled the GM_Eyal sheet, what's my next step? Do you upload it and then we can get the GM#?

Best, Eyal

twang15 commented 3 years ago

Hi Tao,

Yes, I think that might be necessary, since currently it's tied to the same term with 'gRNAs'. I requested access to the spreadsheet, could you or Eyal grant this to me?

I don't want a software update on our end to hold up your submission, so I think adding a submitter_comment about the Cas9 integration on the datasets for now will be sufficient, and you can continue with submitting your other objects. I'll work with you to make sure we are able to model a GeneticModification that our schema can accept, and we can patch that onto the biosamples at a later date.

Best, Ingrid

twang15 commented 3 years ago

Hi Tao and Eyal,

Thank you! I guess one question I have right away is, if the guide RNAs aren't successfully being introduced, should they be mentioned at all? My understanding was that there should be a GM describing the Cas9 being introduced to cells because this step was successful, but that the gRNA part didn't work in the protocol. Eyal, can you clarify what happened with the biosamples?

Best, Ingrid

twang15 commented 3 years ago

Hi Eyal,

No worries, I can confirm that the submission process is definitely confusing and challenging to navigate! Makes sense then that we can just only specify the presence of CRISPR/Cas.

I had another broad question- was the dataset initially intended as another "perturbation followed by snATAC-seq" assay, such as these on the portal? I'm wondering if the value in still uploading this dataset (lacking the perturbation/editing) is for use as a sort of baseline or control for the other datasets.

Thanks, Ingrid

twang15 commented 3 years ago

Good morning, Yes, I think it is like the examples you've sent.

Where are we regarding finishing the submission? Do I need to do anything else for now? just making sure...

Best, Eyal

twang15 commented 3 years ago

Hi Eyal,

We have figured out many questions along the way. The answers to them are critical for the submission, though we are still at the very beginning of submitting GM and Biosamples.

I think Ingrid have another question for you, please kindly reply her most recent email. @Ingrid Ingrid, the question I had with GMs is still unanswered. What shall we do about cell D3?

Best, Tao

twang15 commented 3 years ago

Hi Eyal and Tao,

We're having difficulty representing the genetic modifications well here, in part because we are worried that introducing new terms for your specific case will cause confusion for the way we have modeled other CRISPR editing throughout the portal. I think I need some greater detail about the modifications that were attempted and what exact part failed. Our current model has 1 GM that describes supplying the CRISPR/Cas9 machinery and guide RNAs to the cells, and 1 GM that describes the "action" of the machinery and guides on the cells of the biosample (any actual CRISPR editing). I believe you said the "induction" failed- can you explain exactly what that means in your case? Are the guides present in cells and no editing is taking place, or are the guides missing but Cas present?

If your induction is equivalent to the second GM described above (where the "action" of CRISPR editing takes place), I think we can include your guides in the 1st GM and leave out the second one (again, with a submitter_comment clarifying up front that editing has not taken place).

Lastly, I'm still wondering if this data is now to be used as a control for other spearATAC data or similar, since it is not "perturbation followed by snATAC-seq"/spearATAC itself.

Thank you! -Ingrid

twang15 commented 3 years ago

Hi Ingrid and Tao, Ok, good questions. Let's clarify: The cell line has CRISPRi/dCas9 in it and it was induced in all the cells. Prior to differentiation, the cells were transduced with a sgRNA virus and were selected, that means most of them should have either a control guide or a targeting guide In the experiment I've preformed SpearATAC in 3 time points, Day 0,7,10 The snATAC data (of the induced cells) is in great quality. On the other hand, guide capture was very low: 5000/18000 cells had detectable guide in them out of which, ~3000 were control guides, supposably without any effect; of the reminding ~2000 cells, there were on average 150 cells for each of the 8 targeting guide. The analysis showed that the targeting guides had no effect on the cells, therefor, the biology failed. The interpretation of this experiment is that, to our understanding, the guides didn't affect the differentiation, therefor the data should be regarded as just snATAC. If you think differently, I can also send the guides' sequences as Sarah did and submit it as a SpearATAC data.

I hope this is helpful, Eyal

twang15 commented 2 years ago

Hello all,

Summing up after the call- this data won't be deposited, as we determined it wouldn't be useful to the majority of users. (As it would either be presented as snATAC with some irrelevant CRISPR involvement/elements, or spearATAC without widespread successful CRISPR editing and differentiation.)

Thank you all for working through this, I'm only sorry it took so long to reach this conclusion! Best, Ingrid