Open seungbeom-han opened 1 year ago
I'm currently struggling with failure on detecting most distal polyadenylation site even though I used
--no_redundant longest
option inflair collapse
. While examining the code, I've found out two parts which may be the reason why I was not able to find these distal polyadenylation sites.
- In
collapse_isoforms_precise.py
, Transcript end positions are selected sequentially by the highest read support, weighted by read support of adjacent end positions. Assuming that positions close to high-support transcript end positions are likely to have higher support,sites[s_]
must not be on the denominator. I would be grateful if you explain the intuition behind this algorithm. https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/collapse_isoforms_precise.py#L276-L280- in
flair.py
script, I cannot set read support parameter-s
forcollapse_isoforms_precise.py
, so that transcript ends that are supported by lower than 25% of whole reads supporting intron chain are discarded. https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/flair.py#L567-L577For your information, I send an IGV snapshot of OAZ1, which shows how distal polyadenylation sites are lost after FLAIR collapse. I was able to properly call distal polyadenylation sites after dealing with the two points above.
Hello, I'm having the same problem that I cannot detect most distal APA site . Can you please tell me how to deal with these two points?Truly appreciate your valuable time and assistance.
@zzzyangfan
Unfortunately, for the first point, I cannot tell you how I "dealed" with this because I'm not confident of what I understand about the idea behind this algorithm. But in my experience, dealing only with the second point like below was enough.
For the second point, enabling the usage of -s
option for collapse_isoforms_precise.py
did. Several distal APA transcripts have low proportion, so that they do not pass default minimum support cutoff in collapse_isoforms_precise.py
script. You can edit the 567-568th line in flair.py
script below.
https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/flair.py#L567-L568
The editted line may be like:
collapse_cmd = [sys.executable, path+'collapse_isoforms_precise.py', '-q', precollapse, '-t', str(args.t),
'-m', str(args.max_ends), '-w', str(args.w), '-n', args.n, '-s', str(min_reads), '-o', args.o+'firstpass.unfiltered.bed']
I hope that this is helpful!
Thank you for your helpful response! Best regards, Zeng Yangfan
Han Seungbeom @.***> 于2023年8月6日周日 21:15写道:
@zzzyangfan https://github.com/zzzyangfan
1.
Unfortunately, for the first point, I cannot tell you how I dealed with this because I'm not confident of what I understand about the idea behind this algorithm. But in my experience, dealing only with the second point like below was enough. 2.
For the second point, enabling the usage of -s option for collapse_isoforms_precise.py did. Several distal APA transcripts have low proportion, so that they do not pass default minimum support cutoff in collapse_isoforms_precise.py script. You can edit the 567-568th line in flair.py script below.
https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/flair.py#L567-L568 The editted line may be like:
collapse_cmd = [sys.executable, path+'collapse_isoforms_precise.py', '-q', precollapse, '-t', str(args.t), '-m', str(args.max_ends), '-w', str(args.w), '-n', args.n, '-s', str(min_reads), '-o', args.o+'firstpass.unfiltered.bed']
I hope that this is helpful!
— Reply to this email directly, view it on GitHub https://github.com/BrooksLabUCSC/flair/issues/265#issuecomment-1666856517, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7ZIY2ID5L2SISRIK27YEXTXT6KFZANCNFSM6AAAAAA2TEHBYA . You are receiving this because you were mentioned.Message ID: @.***>
I'm currently struggling with failure on detecting most distal polyadenylation site even though I used
--no_redundant longest
option inflair collapse
. While examining the code, I've found out two parts which may be the reason why I was not able to find these distal polyadenylation sites.In
collapse_isoforms_precise.py
, Transcript end positions are selected sequentially by the highest read support, weighted by read support of adjacent end positions. Assuming that positions close to high-support transcript end positions are likely to have higher support,sites[s_]
must not be on the denominator. I would be grateful if you explain the intuition behind this algorithm. https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/collapse_isoforms_precise.py#L276-L280in
flair.py
script, I cannot set read support parameter-s
forcollapse_isoforms_precise.py
, so that transcript ends that are supported by lower than 25% of whole reads supporting intron chain are discarded. https://github.com/BrooksLabUCSC/flair/blob/dca29f2144485da7152a6cb967e54da6311ded5c/src/flair/flair.py#L567-L577For your information, I send an IGV snapshot of OAZ1, which shows how distal polyadenylation sites are lost after FLAIR collapse. I was able to properly call distal polyadenylation sites after dealing with the two points above.