BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
208 stars 71 forks source link

Questions about flair collapse in regards of TSS/TES calling algorithm #265

Open seungbeom-han opened 1 year ago

seungbeom-han commented 1 year ago

I'm currently struggling with failure on detecting most distal polyadenylation site even though I used --no_redundant longest option in flair collapse. While examining the code, I've found out two parts which may be the reason why I was not able to find these distal polyadenylation sites.

  1. In collapse_isoforms_precise.py, Transcript end positions are selected sequentially by the highest read support, weighted by read support of adjacent end positions. Assuming that positions close to high-support transcript end positions are likely to have higher support, sites[s_] must not be on the denominator. I would be grateful if you explain the intuition behind this algorithm. https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/collapse_isoforms_precise.py#L276-L280

  2. in flair.py script, I cannot set read support parameter -s for collapse_isoforms_precise.py, so that transcript ends that are supported by lower than 25% of whole reads supporting intron chain are discarded. https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/flair.py#L567-L577

For your information, I send an IGV snapshot of OAZ1, which shows how distal polyadenylation sites are lost after FLAIR collapse. image I was able to properly call distal polyadenylation sites after dealing with the two points above. image

zzzyangfan commented 1 year ago

I'm currently struggling with failure on detecting most distal polyadenylation site even though I used --no_redundant longest option in flair collapse. While examining the code, I've found out two parts which may be the reason why I was not able to find these distal polyadenylation sites.

  1. In collapse_isoforms_precise.py, Transcript end positions are selected sequentially by the highest read support, weighted by read support of adjacent end positions. Assuming that positions close to high-support transcript end positions are likely to have higher support, sites[s_] must not be on the denominator. I would be grateful if you explain the intuition behind this algorithm. https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/collapse_isoforms_precise.py#L276-L280
  2. in flair.py script, I cannot set read support parameter -s for collapse_isoforms_precise.py, so that transcript ends that are supported by lower than 25% of whole reads supporting intron chain are discarded. https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/flair.py#L567-L577

For your information, I send an IGV snapshot of OAZ1, which shows how distal polyadenylation sites are lost after FLAIR collapse. image I was able to properly call distal polyadenylation sites after dealing with the two points above. image

Hello, I'm having the same problem that I cannot detect most distal APA site . Can you please tell me how to deal with these two points?Truly appreciate your valuable time and assistance.

seungbeom-han commented 1 year ago

@zzzyangfan

  1. Unfortunately, for the first point, I cannot tell you how I "dealed" with this because I'm not confident of what I understand about the idea behind this algorithm. But in my experience, dealing only with the second point like below was enough.

  2. For the second point, enabling the usage of -s option for collapse_isoforms_precise.py did. Several distal APA transcripts have low proportion, so that they do not pass default minimum support cutoff in collapse_isoforms_precise.py script. You can edit the 567-568th line in flair.py script below. https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/flair.py#L567-L568 The editted line may be like:

    collapse_cmd = [sys.executable, path+'collapse_isoforms_precise.py', '-q', precollapse, '-t', str(args.t), 
        '-m', str(args.max_ends), '-w', str(args.w), '-n', args.n, '-s', str(min_reads), '-o', args.o+'firstpass.unfiltered.bed'] 

I hope that this is helpful!

zzzyangfan commented 1 year ago

Thank you for your helpful response! Best regards, Zeng Yangfan

Han Seungbeom @.***> 于2023年8月6日周日 21:15写道:

@zzzyangfan https://github.com/zzzyangfan

1.

Unfortunately, for the first point, I cannot tell you how I dealed with this because I'm not confident of what I understand about the idea behind this algorithm. But in my experience, dealing only with the second point like below was enough. 2.

For the second point, enabling the usage of -s option for collapse_isoforms_precise.py did. Several distal APA transcripts have low proportion, so that they do not pass default minimum support cutoff in collapse_isoforms_precise.py script. You can edit the 567-568th line in flair.py script below.

https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/flair.py#L567-L568 The editted line may be like:

collapse_cmd = [sys.executable, path+'collapse_isoforms_precise.py', '-q', precollapse, '-t', str(args.t), '-m', str(args.max_ends), '-w', str(args.w), '-n', args.n, '-s', str(min_reads), '-o', args.o+'firstpass.unfiltered.bed']

I hope that this is helpful!

— Reply to this email directly, view it on GitHub https://github.com/BrooksLabUCSC/flair/issues/265#issuecomment-1666856517, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7ZIY2ID5L2SISRIK27YEXTXT6KFZANCNFSM6AAAAAA2TEHBYA . You are receiving this because you were mentioned.Message ID: @.***>