Closed pcarbone closed 4 years ago
Hi @pcarbone , You will need a unified ID for multi-sample FL count file.
Since you have collapsed fasta/count files for each independent sample, you can chain them together using Cupcake.
Note that in the future, if you have multiplexed tissues from the same organism, another way to run the data is to run them pooled first (after removing cDNA primers, barcodes, and polyA tails) then use the demux script to get per-tissue counts later. This approach generally yields slightly more isoforms because the isoseq3 pipeline requires seeing 2 FL reads (regardless of source of tissue) to call an isoform, hence, low abundance isoforms that may be present in only 1 FL copy per tissue will not be called when analyzed by tissue but will be recovered when analyzed with tissues pooled.
-Liz
Dear Liz,
Thank you for your rapid reply and and your useful scripts.
I already chained FL counts using cDNA_Cupcake/build/scripts-3.7/chain_samples.py. However, I guess the problem is that only chained count but no chained fasta were produced by chain_samples.py because I did not include any FASTQ_FILENAME=optional.rep.fastq in the config as input. All demux sample "isoseq3 collapse" runs produced an error and only fasta but no fastq were written. This is why I cannot input fastq into chain_samples.py.
Do you think that the most convenient solution at this point would be processing the pooled data and running demux script afterwards as you suggested?
Thanks again, Pablo
Hi @pcarbone ,
You can convert fasta to fastq using fa2fq.py
in Cupcake. tutorial
-Liz
Hi Liz, The fa2fq.py solution worked and the downstream SQANTI2 from the chained samples as well! Thank you! Pablo
Hi Liz,
Sorry for my misunderstanding, which input isoforms.fasta is needed when using a multi-sample FL Count file produced by the chain_samples.py? I generated a multisample FL count from 10 multiplexed tissues. I have collapsed fasta files for each demultiplexed sample. Shall I generate somehow a merged fasta for all 10-plex as input isoforms for SQANTI2 to be analyzed with the multi-sample FL count data?
Thank you. Pablo