Closed FDallaPozza closed 2 years ago
This is very curious. Can you paste the full command used?
This is the code:
demo_yeast.fa is the genome, chromosomes.txt the list of the chromosomes and all the other the correct files
As I understand it, you have aligned your reads against a genome containing multiple chromosomes and you are then running DRUMMER in isoform mode where the 'isoforms' are the chromosome IDs?
This will give the output you observe with 'transcripID' reflecting the chromosome number and 'position' referring to the position within the chromosome.
Because you are running the analysis in isoform mode, the internal assumption is that the only alignments retained/used are those against the top strand. If you wanted similar information for the reverse strand, you would need to supply an alignment against a reverse-complemented genome file.
Given you are working with yeast, would it not be simpler to work with a transcriptome-level alignment instead of a genome-level alignment?
We'll look into adding the ability to run 'exome' level analyses against multiple chromosomes but this will take a little time to implement.
No I'm running DRUMMER in exome mode (-a exome), the code above was an example and inside chromosomes.txt there was only one chromosome. What I did then was to implement a for loop using another chromosomes.txt file (this time containing all the chromosomes) and so the code was repeated for each chromosome inside the file. Btw the first output I showed was referred only to one chromosome, as specified in the code pasted above
Really interesting question. I suspect when iterating through your text file, the bam files (-c, -t arguments) are always kept constant. The issue with this is that DRUMMER uses this information for downstream directory names, which means that previous outputs can be overridden as you progress through the loop.
Could you tell me what is contained in one of the complete_analysis folders? If there are DRUMMER outputs for each chromosome, we might be able to directly call summary.py and multiple_combine.py using these files.
However, I do think the advice of DepledgeLab is best. Since we designed DRUMMER initially for viruses with 1 "chromosome", exome mode may not be best in this case. The best course of action would be treating these different chromosomes as their own transcript and running this analysis in isoform mode.
I set the output folder with the -o option, so in every iteration the output is a folder with the name of the chromosome, like this:
For example inside the folder XVI/ :
The file multiple_comp.txt is the one I pasted in the first message, inside the folder inimap.sortG.1.1-inimap.sortG.2.1 I got :
And this shows some lines of XVI.complete.txt file inside the complete_analysis folder:
So I looked through some old data that we ran in exome mode. Looks like the output you are getting is correct.
However, for your data you could (1) Separate your bam files into fwd/rev and by using information from the resulting data (chromosome, position, strand) you can map to a transcript by using a GTF file.
Ok thank you for the help, so I think I'll run it in the isoform mode and then perform a lift-over to obtain all the genomic coordinates. Best, Fabio
Hi, I tried to use the multiple replicates isoform mode to perform an analysis, using the same command provided in the documentation, I got a weird output where it seems some columns are missing with respect to how the output should be. This is the output:
In particular, for what I need from the output, "transcriptID" and "position" seems to be referred only to the genome, and "strand" is missing. Am I doing something wrong? Best Regards. Fabio