Closed WallyL closed 5 years ago
Hi @WallyL ,
--by refgene
and --by refisoform
will only work if you provide a SQANTI2 classification when you ran make_file_for_subsampling_from_collapsed.py
using the -m2
option. Otherwise, there is no reference gene or transcript info. This is most obvious by looking at your input to the rarefaction (which is the output of make_file_for_subsampling_from_collapsed.py
), the refgene
and refisoform
columns must be populated for subsample.py
to work with them.
--LIz
Thanks, Liz. I saw SQUANT2 section, but didn't catch that it was required.
So the identical count_fl count_nfl values in the last abundance output is nothing to be concerned about?
Walt
Hi Liz,
I'm running a set of human data that has been processed per the latest IsoSeq3 scripts. The Cupcake input file is HQ isoforms generated after isoseq3 polish. Minimap2 was run vanilla:
minimap2 -ax splice -t 8 -uf --secondary=no -C5 Homo_sapiens.GRCh38.dna.toplevel.fa \ AC_polished.hq.fastq > AC_hq.sam
So after I ran the Cupcake pipeline (it ran fine, w/no errors), I tried rarefaction and it failed when run with --by refgene and --by refisoform flags, i.e. no output. (--by pbgene seemed to work). So I went back and looked at some of the collapse and abundance report outputs and found the following and wanted check to see if these look okay. Here are the headers of the files:
- AC_hq.collapsed.group.txt output from _collapse_isoforms_bysam.py probably fine, just doesn't look anything like the output in your Cupcake example.
PB.1.1 transcript/23910 PB.2.1 transcript/9327 PB.2.2 transcript/9980 PB.2.3 transcript/9895 PB.2.4 transcript/16058 PB.3.1 transcript/28571 PB.4.1 transcript/29272 PB.4.2 transcript/28803,transcript/30061 PB.5.1 transcript/307 PB.5.2 transcript/397
- AC_hq.collapsed.read_stat.txt output from _get_abundance_postcollapse.py- the length field is all NA. I see the mmap2 out has
WARNING: isoseq3 format detected. Output length column will be NA
but can be resolved after the fact with _get_seqstats.py Curious if you know if there's a mmap2 flag that will fix this and/or do I need to include the lengths prior to running later steps?
id length is_fl stat pbid m54193_190813_205800/27066957/ccs NA Y unique PB.9184.3 m54193_190813_205800/25887454/ccs NA Y unique PB.9184.3 m54193_190813_205800/73859721/ccs NA Y unique PB.9184.3
- AC_hq.collapsed.abundance.txt output from _get_abundance_postcollapse.py The fl and nfl fields are all the same...
head pbid count_fl count_nfl count_nfl_amb norm_fl norm_nfl norm_nfl_amb PB.2.1 4 4 4 1.7230e-05 1.7230e-05 1.7230e-05 PB.2.2 42 42 42 1.8091e-04 1.8091e-04 1.8091e-04 PB.2.3 28 28 28 1.2061e-04 1.2061e-04 1.2061e-04 PB.2.4 2 2 2 8.6148e-06 8.6148e-06 8.6148e-06 PB.3.1 3 3 3 1.2922e-05 1.2922e-05 1.2922e-05 PB.4.1 229 229 229 9.8639e-04 9.8639e-04 9.8639e-04 tail PB.9199.1 3 3 3 1.2922e-05 1.2922e-05 1.2922e-05 PB.9199.2 3 3 3 1.2922e-05 1.2922e-05 1.2922e-05 PB.9199.3 44 44 44 1.8953e-04 1.8953e-04 1.8953e-04 PB.9199.4 7 7 7 3.0152e-05 3.0152e-05 3.0152e-05
Should I be concerned about this last output? I'll post later regarding rarefaction, if I can't get it working, but wanted to ask about this file first, since it looks like it's an input for _make_file_for_subsampling_fromcollapsed.py
Thanks, Walt
Hi. Have you solved this problem? I also confused about it and could not get the information about the refgene and refisoform columns Thanks Toney
Hi @WallyL ,
Yes it is expected count_fl
and count_nfl
are now the same. We no longer use nFL in IsoSeq3 so only the count_fl
matters.
-Liz
Hi Liz,
I'm running a set of human data that has been processed per the latest IsoSeq3 scripts. The Cupcake input file is HQ isoforms generated after isoseq3 polish. Minimap2 was run vanilla:
So after I ran the Cupcake pipeline (it ran fine, w/no errors), I tried rarefaction and it failed when run with --by refgene and --by refisoform flags, i.e. no output. (--by pbgene seemed to work). So I went back and looked at some of the collapse and abundance report outputs and found the following and wanted check to see if these look okay. Here are the headers of the files:
1) AC_hq.collapsed.group.txt output from _collapse_isoforms_bysam.py probably fine, just doesn't look anything like the output in your Cupcake example.
2) AC_hq.collapsed.read_stat.txt output from _get_abundance_postcollapse.py- the length field is all NA. I see the mmap2 out has
but can be resolved after the fact with _get_seqstats.py Curious if you know if there's a mmap2 flag that will fix this and/or do I need to include the lengths prior to running later steps?
3) AC_hq.collapsed.abundance.txt output from _get_abundance_postcollapse.py The fl and nfl fields are all the same...
Should I be concerned about this last output? I'll post later regarding rarefaction, if I can't get it working, but wanted to ask about this file first, since it looks like it's an input for _make_file_for_subsampling_fromcollapsed.py
Thanks, Walt