christopher-vollmers / AIRR-single-cell

2 stars 1 forks source link

Script missing #1

Closed JYLeeBioinfo closed 3 years ago

JYLeeBioinfo commented 3 years ago

Hello!

I am trying to run the Ig and TCR analysis code in this repo but

it seems to fail because IGLWrapper_simple.py is not in the repo.

Could you help me with this?

Wrapper.py image

christopher-vollmers commented 3 years ago

Just added the script. Sorry for the oversight. Let me know if you have any more issues!

JYLeeBioinfo commented 3 years ago

Thank you for your prompt reply! I'll try the updated one

christopher-vollmers commented 3 years ago

Generally, T cells make less receptor transcripts than B cells. We talk about that a little bit in the paper. To be perfectly honest, I picked those boundaries by hand to contain the loci as they appear on the genome browser. Quite possible I was a bit off on the V segment ends of the loci. Most important is that the boundaries contain the constant regions as those are the only parts reads are more or less guaranteed to align.

On Wed, Mar 3, 2021 at 11:50 PM hd00ljy notifications@github.com wrote:

Could I ask you about why you excluded some of IG and TR genes when you define regions for initial read filtering steps?

At the extracte4d sam file level, I get high read counts for B cells at IGL,IGK,IGH regions (mostly >20) but get very little read counts for T cells at TRA,TRB regions (mostly 0, 1~3 if any)

So I overlapped the regions with all the genes with gene type related to IMGT as described in GENCODE documentation ( https://www.gencodegenes.org/pages/biotypes.html) and found that some genes don't overlap the regions.

[image: image] https://user-images.githubusercontent.com/13775022/109928049-6012fe80-7d08-11eb-9769-74ba2fc74a5c.png

I tried counting reads including those excluded genes but actually the read counts didn't dramatically increase

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/christopher-vollmers/AIRR-single-cell/issues/1#issuecomment-790403188, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGX7GL6NI2AXDZO6KGBPYGDTB43UDANCNFSM4YSMZRVQ .

JYLeeBioinfo commented 3 years ago

Thank you for your reply!

By the way, I accidentally deleted my original question, so I pasted that here

Could I ask you about why you excluded some of IG and TR genes when you define regions for initial read filtering steps?

At the extracte4d sam file level, I get high read counts for B cells at IGL,IGK,IGH regions (mostly >20) but get very little read counts for T cells at TRA,TRB regions (mostly 0, 1~3 if any)

So I overlapped the regions with all the genes with gene type related to IMGT as described in GENCODE documentation ( https://www.gencodegenes.org/pages/biotypes.html) and found that some genes don't overlap the regions.

[image: image] https://user-images.githubusercontent.com/13775022/109928049-6012fe80-7d08-11eb-9769-74ba2fc74a5c.png

I tried counting reads including those excluded genes but actually the read counts didn't dramatically increase