StanfordBioinformatics / pulsar_lims

A LIMS for ENCODE submitting labs.
3 stars 1 forks source link

Pulsar New feature: develop client code for snRNA submission #117

Closed twang15 closed 3 years ago

twang15 commented 3 years ago

new client code for snRNA submission needs to be developed.

Requirement:

  1. 2 files have to be submitted for snRNA: R1 and R2.
  2. Files are identified by their barcode and read number (R1 or R2)
twang15 commented 3 years ago

Depends by https://github.com/nathankw/pulsar_lims/issues/104

twang15 commented 3 years ago

Hi Annika,

Could you please add barcodes for these libraries so that I can continue code development for snRNA?

https://pulsar-encode.herokuapp.com/sequencing_requests/310/libraries_index

Thanks a lot!

Best, Tao

twang15 commented 3 years ago

Hi Minyi,

Could you add the barcodes to the libraries below, please?

Thank you! Annika

twang15 commented 3 years ago

Done!

Thanks,

Minyi

twang15 commented 3 years ago

Minyi and Annika, thanks for taking care of this so fast!

Best, Tao

twang15 commented 3 years ago

Complexity:

some dataset's quality is not good enough for submission. Meeting on Thursday for further hand-picking.

twang15 commented 3 years ago

snRNA multiome and snRNA non-multiome (separated snRNA) shares the same submission logic. ChIP and bulk-Atacseq share the same submission logic.

twang15 commented 3 years ago

Submission done: https://test.encodedcc.org/experiments/TSTSR544677/

Ready for correctness verification from Annika.

twang15 commented 3 years ago

Hi Annika and Yunhai,

I’ve finished implementation the submission code for single nuclear RNA-seq. Could you please help check the correctness of the following test case?

https://test.encodedcc.org/experiments/TSTSR544677/

Thanks a lot!

Best, Tao

twang15 commented 3 years ago

Hi Tao,

1) One problem is that we have an inconsistent platforms audit. Everything was sequenced on the NovaSeq6000 (this is true for all sc and multiome experiments). Not sure where the HiSeq4000 came from? Is this inconsistent in Pulsar?

2) The size range in Pulsar for scRNA libraries is 300-800 but it’s different in the test submission.

3) It has the multiome protocol attached, but it should be the GEX protocol from 10x (this is DOC-66 in Pulsar) for scRNA experiments.

Besides this, it looks good.

Thank you, Annika

twang15 commented 3 years ago

Hi Annika,

Thanks for your help.

HiSeq4000 is listed as the platform in SREQ-310: https://pulsar-encode.herokuapp.com/sequencing_requests/310, so I guess this was a mis-input, and I can fix it. You are right about this. I am not sure how this number ended up here, but the code can pull this information correctly now. Doc-64, 65 are listed as the documents for this library: https://pulsar-encode.herokuapp.com/libraries/11397, it may be another mis-input?

Best, Tao

twang15 commented 3 years ago

Hi Tao,

  1. I changed it in pulsar.
  2. Thanks for fixing the code.
  3. I was wrong. sorry. It’s the right protocol. All good.

A.

twang15 commented 3 years ago

Hi Annika and Tao,

Just to add one more, if the plan in the other email sounds good to Annika (still need confirmation from Annika), we should

1) Use "single-cell RNA sequencing assay" instead of "single-nucleus RNA-seq" as Experiment assay_term_name 2) Mark all biosample as "subcellular_fraction_term_name" = "nucleus"

Best, Yunhai

twang15 commented 3 years ago

Hi Annika,

Thank you for confirming.

Hi Tao,

Please let us know if there is anything unclear about that.

Best, Yunhai

twang15 commented 3 years ago

Hi Annika,

Thank you for confirming.

Hi Tao,

Please let us know if there is anything unclear about that.

Best, Yunhai

Hi Yunhai and Annika,

It is clear to me and I will implement the new “assay_term_name” and "subcellular_fraction_term_name" my code.

Best, Tao

twang15 commented 3 years ago

Hi Annika, Thank you for confirming. Hi Tao, Please let us know if there is anything unclear about that. Best, Yunhai

Hi Yunhai and Annika,

It is clear to me and I will implement the new “assay_term_name” and "subcellular_fraction_term_name" my code.

Best, Tao

Hi Annika and Yunhai,

Could you please confirm the answers for the following questions?

Questions about “subcellular_fraction_term_name”.

  1. For single-cell Atacseq / single-nuclear Atacseq, what value should “subcellular_fraction_term_name” hold?
  2. For single-cell RNA sequencing assay, which in fact is performed on single cell, what value should “subcellular_fraction_term_name” hold?
  3. If I understand correctly, for single cell RNA sequencing assay, which is in fact performed on single nuclear, subcellular_fraction_term_name”=”nucleus”, right?
  4. How about bulk-atac and ChIP, do they also need “subcellular_fraction_term_name”? If so, what values should be respectively?
  5. Children with the same biosample parent could be used for both single-cell assay and bulk assay at the same time. What value should the parent biosample hold for “subcellular_fraction_term_name”?
  6. Children with the same biosample parent could be used for both single-cell Atac/RNA assay and single nucleus RNA assay at the same time. What value should the parent biosample hold for “subcellular_fraction_term_name”?

Thanks a lot!

Best, Tao

twang15 commented 3 years ago

Hi Annika, Thank you for confirming. Hi Tao, Please let us know if there is anything unclear about that. Best, Yunhai

Hi Yunhai and Annika, It is clear to me and I will implement the new “assay_term_name” and "subcellular_fraction_term_name" my code. Best, Tao

Hi Annika and Yunhai,

Could you please confirm the answers for the following questions?

Questions about “subcellular_fraction_term_name”.

  1. For single-cell Atacseq / single-nuclear Atacseq, what value should “subcellular_fraction_term_name” hold?
  2. For single-cell RNA sequencing assay, which in fact is performed on single cell, what value should “subcellular_fraction_term_name” hold?
  3. If I understand correctly, for single cell RNA sequencing assay, which is in fact performed on single nuclear, subcellular_fraction_term_name”=”nucleus”, right?
  4. How about bulk-atac and ChIP, do they also need “subcellular_fraction_term_name”? If so, what values should be respectively?
  5. Children with the same biosample parent could be used for both single-cell assay and bulk assay at the same time. What value should the parent biosample hold for “subcellular_fraction_term_name”?
  6. Children with the same biosample parent could be used for both single-cell Atac/RNA assay and single nucleus RNA assay at the same time. What value should the parent biosample hold for “subcellular_fraction_term_name”?

Thanks a lot!

Best, Tao

Hi all,

Sorry for the late reply.

  1. I think we just don’t fill that out? But if we have to, then it will be ’nucleus’ for ATAC Right, I don't think you have to since it can only be nucleus. But if for completeness you want to, we welcome that and nucleus is indeed the right word.

  2. We don’t do these kind of assays at the moment. @Yunhai, what would it be in case we will add those assays in the future? For single cell RNA-seq (NOT single nucleus RNA-seq), don't put anything for "subcellular_fraction_term_name" since it's the whole cell not a fraction. So in principle, if you can ever do a cytosolic only RNA-seq, please feel free to put "cytosol" there.

  3. For bulk ATAC it could be ’nucleus’ as well but I don’t think it’s crucial (because ATAC is be default always nucleus). For ChIP I wouldn’t fill this out at all. Like 1 and agree. Not crucial but you can do. Though no one else do this so we (meaning us DCC) might need to think about consistency later.

  4. Isn’t the sub cellular fraction part on the experiment and not on the biosamples? Sorry about the confusion Annika. It's on biosample and brought back to experiment. For single cell multiomic, since you use the 10x kit, it says the nuclei are prepared before you put in the procedure and prepare ATAC and RNA library. So the parent is nucleus. For snATAC vs bulk ATAC, Annika can explain since I don't know the details. But "nucleus" is not important anyway for ATAC.

both single-cell Atac/RNA assay and single nucleus RNA assay at the same time. I don't know how you do three assays together. Maybe Annika?

Best, Yunhai

twang15 commented 3 years ago

Ready for for testing again: https://test.encodedcc.org/biosamples/TSTBS703462/

twang15 commented 3 years ago

Hi Yunhai,

Thanks for your feedback. Here is a brand new submission for single-nuclear RNA-seq. Could you please help verify its correctness? https://test.encodedcc.org/experiments/TSTSR221597/

Thanks, Tao

twang15 commented 3 years ago

Hi Tao, this looks good to me. Thank you! Annika

twang15 commented 3 years ago

Annika,

thanks a lot for your confirmation! Now the submission code for snRNA is ready for production.