StanfordBioinformatics / pulsar_lims

A LIMS for ENCODE submitting labs.
3 stars 1 forks source link

Modeling: Submission of WTC-11 Ngn2 ChIP experiments #802

Open twang15 opened 2 years ago

twang15 commented 2 years ago

Hi Tao, here is some info to already think about for the Ngn2 submission, see how far you get with understanding what’s going on here. Let’s schedule a meeting either Thursday or Friday morning?

Submission of WTC-11 Ngn2 ChIP experiments We used the following 4 biosamples in our experiments:

  1. WTC-11 that is modified with Ngn2 as control
  2. WTC-11 that is modified with Ngn2 and an additional genetic modification to tag the TF with GFP
  3. WTC-11 that is modified with Ngn2 —> and then differentiated
  4. WTC-11 that is modified with Ngn2 and an additional genetic modification to tag the TF with GFP —> and then differentiated

For all of them: The Ngn2 GM can be added with ENCGM156EZW: https://www.encodeproject.org/genetic-modifications/ENCGM156EZW/

For 2. and 4.: we add an additional GM the way we always did for our ChIP experiments (GFP)

For 3. and 4.: the biosample has additional information similar to https://www.encodeproject.org/biosamples/ENCBS229NUR/ —> in vitro differentiated cells, term name: excitatory neurons

Biosample have to be linked: 3 (in my list above) originates from 1 // 4 originates from 2 (it’s basically the same biosample, but differentiated).

twang15 commented 2 years ago

Meeting memo with Annika, 2021-12-02

Tao_Ngn2 submission.docx

twang15 commented 2 years ago

Hi Tao,

How is the WTC-11 Ngn2 submission going? I think you wanted to try it on the test server. Did this happen?

John delivered the new SREQ-447 and this is Ngn2 as well.

Could you send me the link to the test server so that we can see if something needs to be changed?

Thanks, Annika

twang15 commented 2 years ago

Hi Annika,

I have been busy with other submissions, but I will make progress on this one this week. Right now, the most urgent one is to correct the errors that was induced by mistakes from Sequencing center. 4 ChIP are in this category.

Also, for SREQ-432, could you please confirm that it was indeed the right SREQ-432 that has got submitted.

Best, Tao

twang15 commented 2 years ago

{ "title": "Originated from", "description": "A biosample that the sample was orginated from.", "comment": "See biosample.json for available identifiers.", "type": "string", "linkTo": "Biosample" }

twang15 commented 2 years ago

Hi Annika,

For NG2 submssion, one biosample may originate from another one. On the ENCODE portal, there are several ways to represent this information.

originated_from parent_of part_of

It looks like to me that originated_from shall be the right one to model this relationship. What do you think abou this?

twang15 commented 2 years ago

Hi Annika,

Please check out this submission for cs-566: https://test.encodedcc.org/experiments/TSTSR332516/

Several issues:

  1. if the biosample is human tissue, please let me know the NIC number.
  2. Do this biosample have IPs? If so, please let me know where to find it.
  3. How to address the two warnings: Antibody ENCAB728YTO has characterization attempts but does not have the full complement of characterizations meeting the standard in this cell type and organism: Awaiting submission of primary characterization(s).

    Genetic modification ENCGM156EZW of method TALEN is missing reagents.

Thanks, Tao

twang15 commented 2 years ago

Manually add originated_from for biosamples:

link biosamples: 1 is the parent of 3 and 2 is the parent of 4

twang15 commented 2 years ago
  1. GMs submission: (Manually add the one the portal: ENCGM156EZW a. (GM1_on_ENCODE_portal = ENCGM156EZW) b. (ENCGM156EZW, GM2_in_Pulsar) c. (ENCGM156EZW) d. (ENCGM156EZW, GM2_in_Pulsar)
twang15 commented 2 years ago

Missing PCRs:

ENCGM156EZW is missing characterization validating the modification. https://test.encodedcc.org/genetic-modifications/ENCGM156EZW/

Biosample TSTBS884133 which has been modified by genetic modification ENCGM187KHY is missing characterization validating the modification.

twang15 commented 2 years ago

PCR for ENCGM187KHY is posted manually.

twang15 commented 2 years ago

IPs for the biosamples of cs-555 are posted manually.

twang15 commented 2 years ago

Manually post NIC

twang15 commented 2 years ago

Hi Tao,

Thanks for trying the submission!

  1. NIC number is NIC00045
  2. Donor is human (there’s an audit about inconsistent organism on the test site)
  3. We used the same AB that we always use for ChIP-seq. I don’t know why there’s an audit on the AB. Maybe that’s only on the test site? The same AB on the production site has not audit I believe.
  4. Genetic modification is missing reagents: you can ignore this.
  5. The ones that you submitted are the non-differentiated biosamples, right? How do we link them with the differentiated? Probably experimental series? Can you try this?
  6. There should be an IP on the undifferentiated samples on Pulsar from Cory.

THanks, Annika

twang15 commented 2 years ago

Hi Annika,

Here are all related records for one NGN2 submission:

  1. undifferentiated: https://test.encodedcc.org/experiments/TSTSR288270/
  2. differentiated: https://test.encodedcc.org/experiments/TSTSR332516/
  3. experiment series: https://test.encodedcc.org/experiment-series/TSTSR167681/

Please take a look and let me know how it look.

Thanks, Tao

twang15 commented 2 years ago

Experiment series sheet: https://docs.google.com/spreadsheets/d/1bYJ8w2XqfR9El0dXEl9QJKXAjy58zL8b/edit#gid=2018381402

twang15 commented 2 years ago

Hi Tao,

I think this looks good. Could you check with the DCC if they think this is ready for the production site?

Thanks, Annika

twang15 commented 2 years ago

Hi Ingrid,

I have submitted an NGN2 dataset on ENCODE test site. Could you help us check whether the submission is OK?

Here are all related records for one NGN2 submission:

  1. undifferentiated: https://test.encodedcc.org/experiments/TSTSR288270/
  2. differentiated: https://test.encodedcc.org/experiments/TSTSR332516/
  3. experiment series: https://test.encodedcc.org/experiment-series/TSTSR167681/

Thanks, Tao

twang15 commented 2 years ago

Hi Tao,

The originated_from relationship of the biosamples looks good to me. However, for differentiated biosamples, we typically want to indicate the differentiation using documents and/or Treatments, or by using a biosample_ontology that is in the "in vitro differentiated cells" category. An example of a differentiated biosample with Treatments is: https://www.encodeproject.org/biosamples/ENCBS296AAA/

Is there a differentiation protocol document or Treatment metadata that could be added to these biosamples?

Additionally, the datasets could be submitted to a Differentiation Series, which indicates a more specific purpose for the series than Experiment Series. Here's a link to the schema page: https://www.encodeproject.org/profiles/differentiation_series

The submission format is very similar to Experiment Series. Please let me and Ingrid know if you have any questions!

Best, Jennifer

twang15 commented 2 years ago

Hi Jennifer,

Thanks for point out these issues!

I have added the in vitro differentiated ontology and a differentiation series: https://test.encodedcc.org/differentiation-series/TSTSR032623/

Best, Tao

twang15 commented 2 years ago

I added the differentiation protocol: /documents/294a5615-565c-42d2-a387-5b6b908e8d56/

Tao, this should attached to the differentiated experiment.

Thanks, Annika

twang15 commented 2 years ago

Hi Annika,

In addition to SREQ-447, what else Sequence requests are WTC-11 Ng2 submission?

Thanks, Tao

twang15 commented 2 years ago

All NGN2s are in the spread sheet: https://docs.google.com/spreadsheets/d/10ed-wf_q34GQ2h7d66vaUyGq1wuxSwH-/edit#gid=221013195

twang15 commented 2 years ago

Hi Annika,

For TF JUN of WTC11-NGN2 submission, I can find only the undifferentiated reps. Could you please let me know whether the differentiated reps are located?

https://docs.google.com/spreadsheets/d/10ed-wf_q34GQ2h7d66vaUyGq1wuxSwH-/edit#gid=221013195

Thanks, Tao

twang15 commented 2 years ago

https://pulsar-encode.herokuapp.com/chipseq_experiments/555

Hi Annika,

Could you please add the NIC to this biosample? https://pulsar-encode.herokuapp.com/biosamples/13734

Thanks, Tao

twang15 commented 2 years ago

cs-555 is missing genetic tags https://www.encodeproject.org/experiments/ENCSR110BQW/

twang15 commented 2 years ago

cs-556 was submitted. SREQ-432 was original SREQ-430 (SREQ-431 after correction), looks like that John has replaced it. Waiting for his md5sum confirmation.

cs-556 uses SREQ-431-AGTCAA_S8_L002_R1_001.fastq.gz as wild-type control. Which was submitted as part of SREQ-432: https://www.encodeproject.org/files/ENCFF882WED/

twang15 commented 2 years ago

Hi All,

Here is my understanding of what happened here:

The sample sheet for the first sequencing run (211012_A00509_0356_AHN2N5DRXY) initially had SREQ-431 in lane 1 and SREQ-432 in lane 2. This was incorrect and the correct samples should have been SREQ-430 in lane 1 and SREQ-431 in lane 2. So, both the initial set of fastq files for this run were incorrect.

The sample sheet for the second sequencing run (211104_A00509_0378_BHMYV2DRXY) should have SREQ-432 in lane 1 and SREQ-433 in lane 2. This run was uploaded on Nov 7 and should be the correct fastq files for these samples.

Does this make sense or am I missing something?

Best, John

twang15 commented 2 years ago

https://www.encodeproject.org/files/ENCFF882WED/ should be Read 1 of https://www.encodeproject.org/experiments/ENCSR548RFU/

https://www.encodeproject.org/files/ENCFF549GBP/ should be Read 2 of https://www.encodeproject.org/experiments/ENCSR548RFU/

Read 1 and Read 2 of library ENCLB809IBI of https://www.encodeproject.org/experiments/ENCSR744RAS/ has not been submitted yet.

twang15 commented 2 years ago

Hi Jennifer,

There is a data swap on ENCODE portal involving the following datasets. Could you please help move the fastq files around?

https://www.encodeproject.org/files/ENCFF882WED/ should be Read 1 of https://www.encodeproject.org/experiments/ENCSR548RFU/ https://www.encodeproject.org/files/ENCFF549GBP/ should be Read 2 of https://www.encodeproject.org/experiments/ENCSR548RFU/ Read 1 and Read 2 of library ENCLB809IBI of https://www.encodeproject.org/experiments/ENCSR744RAS/ have not been submitted yet. After you move the above two files, please let me know so that I can submit the new datasets.

Thanks, Tao

twang15 commented 2 years ago

cs-556 of NGN2 differentiated, paired w/ cs-555 https://www.encodeproject.org/experiments/ENCSR431UCR/

Issues:

  1. Inconsistent genetic modification tags
  2. Warning: Missing input control
twang15 commented 2 years ago

Hi Jennifer,

Thank you for moving the dataset. The new datasets for https://www.encodeproject.org/experiments/ENCSR744RAS/ have been submitted and are ready for reprocessing.

Best, Tao

twang15 commented 2 years ago

All 6 bio-samples should have GM "/genetic-modifications/ENCGM156EZW/" All

cs-555: https://www.encodeproject.org/experiments/ENCSR110BQW/

cs-556: https://www.encodeproject.org/experiments/ENCSR431UCR/

Experiment series: https://www.encodeproject.org/experiment-series/ENCSR575GUR/

twang15 commented 2 years ago

Hi Jennifer,

Here is a experiment series submission: https://www.encodeproject.org/experiments/ENCSR431UCR/ Could you please check the quality and let me know whether there is any issue?

Thanks a lot!

Best, Tao

twang15 commented 2 years ago

After Jennifer's confirmation, I may go ahead to submit more WTC11 NGN2 Diff datasets, following the same procedure.

  1. All Diff: https://docs.google.com/spreadsheets/d/10ed-wf_q34GQ2h7d66vaUyGq1wuxSwH-/edit#gid=221013195
  2. Experiment series sheet: https://docs.google.com/spreadsheets/d/1bYJ8w2XqfR9El0dXEl9QJKXAjy58zL8b/edit#gid=2018381402
twang15 commented 2 years ago

ENCODE data submission: WTC11 NGN2 Diff

ATF3

There is no experiment in pulsar for ATF3

ATF3 WTC11-Ngn2 rep 1
ATF3 WTC11-Ngn2 rep2
ATF3 WTC11-Ngn2 rep1-dif
ATF3 WTC11-Ngn2 rep2-dif

twang15 commented 2 years ago

WT

What are their pulsar IDs (bio sample/library/experiments, etc)

WT WTC11-Ngn2 Batch1
WT WTC11-Ngn2 Batch1, diff
WT WTC11-Ngn2 Batch2
WT WTC11-Ngn2 Batch2,diff

twang15 commented 2 years ago

EGR1

https://pulsar-encode.herokuapp.com/chipseq_experiments/565 https://pulsar-encode.herokuapp.com/chipseq_experiments/566

FOS

https://pulsar-encode.herokuapp.com/chipseq_experiments/542 https://pulsar-encode.herokuapp.com/chipseq_experiments/553

JDP2

https://pulsar-encode.herokuapp.com/chipseq_experiments/561 https://pulsar-encode.herokuapp.com/chipseq_experiments/562

RFX5

https://pulsar-encode.herokuapp.com/chipseq_experiments/563 https://pulsar-encode.herokuapp.com/chipseq_experiments/564


FOSL1

https://pulsar-encode.herokuapp.com/chipseq_experiments/541, w/o replicates https://pulsar-encode.herokuapp.com/chipseq_experiments/560