Open bdeonovic opened 9 years ago
Hi
Thanks for trying out EMASE and sorry that the documentation is not clear. prepare-emase can be run in two ways. Can you please explain, how you are trying to use EMASE.
In the first example, prepare-emase takes a genome sequence (haploid) as fasta file and annotation as gtf file and extracts the set of all transcript sequences. (and also length of each transcripts
trans1 ATGC trans2 ATGCTAGC
In the second case prepare-emase can take multiple genomes and GTF files with suffix names followed by "_" as input and creates pooled transcriptome. For diploid genomes, the genome sequences correspond to maternal and paternal genomes (and GTF files) and the suffixes correspond to names of haplotypes that is used to differentiate the haplotypes in the genomes, GTF files, and the diploid transcriptomes.
Hope it helps. Narayanan
Ultimately I would like to get the output noted at the bottom of the usage documentation:
‘run-emase’ outputs the following files:
${OUTBASE}.isoforms.effective_read_counts
${OUTBASE}.isoforms.tpm
${OUTBASE}.genes.effective_read_counts
${OUTBASE}.genes.tpm
which steps do I need to follow to get to these results?
You may find this useful to get the results. http://emase.readthedocs.org/en/latest/usage.html
Thanks Narayanan
That is the documentation that I have been referencing. It is confusing. Such as:
run-emase -i ${EMASE_FILE} -g ${GROUP_FILE} -L ${TINFO_FILE} -M ${MODEL} -o ${OUTBASE} \
-r ${READLEN} -p ${PSEUDOCOUNT} -m ${MAX_ITERS} -t ${TOLERANCE}
What is ${MODEL}
or ${PSEUDOCOUNT}
?
EMASE has four EM models for dealing with multimapped reads and we are testing them now. We recommend using model 4 by specifying "-M 4".
Pseudocount option enables bayesian estimation of allele specificity, which we have not tested extensively. Please use zero pseudocount by specifying "-p 0".
Narayanan
Thank you for providing the details. I understand the software is still in development. I appreciate the support.
Hi I am interested in running your software on some RNA-seq data. The documentation for how to run from command line is not very good. After I run:
The usage tells me to run:
but I am not sure what GENOME1,GENOME2,GTF1,GTF2,SUFFIX1,SUFFIX2 are.
Thanks