Open inti opened 5 years ago
Hi Inti,
Try to build kallisto
index using its older version (https://pachterlab.github.io/kallisto/download) like v0.42.1. They upgraded its indexing to version 9 at some point but our kallisto-align
uses version 8. We will catch up with it at some point (I am considering to merge it to alntools
) but not soon unfortunately. Thanks for using kallisto-align
.
KB
Hi,
thanks for the response. That did not work. I got the same error as before.
I built the index with kallisto v0.42.1
then with `kallisto-align``
bash-4.2$ ~/app/kallisto-align/kallisto-align -i emase/SRR5125117/SRR5125117.k_idx -f fastq/SRR5125117_1.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate
Hi, I had done it previously with kallisto (v0.42.1)
Sorry I did not send the full code I ran. Here I am sending the output of building the index and trying to run kallisto-align
I also tried with the kallisto v0.42
and had the same error
bash-4.2$ ~/app/kallisto_linux-v0.42.1/kallisto index -i emase/SRR5125117/SRR5125117.k_idx emase/SRR5125117/SRR5125117.transcripts.fa
[build] loading fasta file emase/SRR5125117/SRR5125117.transcripts.fa
[build] k-mer length: 31
[build] warning: replaced 14045 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 2065 contigs and contains 185230 k-mers
bash-4.2$ ~/app/kallisto-align/kallisto-align -i emase/SRR5125117/SRR5125117.k_idx -f fastq/SRR5125117_1.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regeneratebash-4.2$
I am sorry, I meant you should try if kallisto quant
works fine with the same input files on v0.42.1.
kallisto 0.42.1
Computes equivalence classes for reads and quantifies abundances
Usage: kallisto quant [arguments] FASTQ-files
Required arguments:
-i, --index=STRING Filename for the kallisto index to be used for
quantification
-o, --output-dir=STRING Directory to write output to
Optional arguments:
--single Quantify single-end reads
-l, --fragment-length=DOUBLE Estimated average fragment length
(default: value is estimated from the input data)
-b, --bootstrap-samples=INT Number of bootstrap samples (default: 0)
--seed=INT Seed for the bootstrap sampling (default: 42)
--plaintext Output plaintext instead of HDF5
I did try
bash-4.2$ ~/app/kallisto_linux-v0.42.1/kallisto index -i emase/SRR5125117/SRR5125117.transcripts.k_idx emase/SRR5125117/SRR5125117.transcripts.fa
[build] loading fasta file emase/SRR5125117/SRR5125117.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 182 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 84409 contigs and contains 21292011 k-mers
bash-4.2$ ~/app/kallisto_linux-v0.42.1/kallisto quant -i emase/SRR5125117/SRR5125117.transcripts.k_idx -o test fastq/SRR5125117_1.fastq.gz fastq/SRR5125117_2.fastq.gz
[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 29493
[index] number of k-mers: 21292011
[index] number of equivalence classes: 61508
[quant] running in paired-end mode
[quant] will process pair 1: fastq/SRR5125117_1.fastq.gz
fastq/SRR5125117_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 0 reads, 0 reads pseudoaligned
[quant] estimated average fragment length: -nan
[ em] quantifying the abundances ... done
[ em] the Expectation-Maximization algorithm ran for 1 rounds
it does not work (all transcripts have 0 counts) ... :/ it does not work with the same files and teh newest version of kallisto (v0.44). Neither it works with the ref transcriptome Bombus_terrestris.Bter_1.0.cdna.all.fa
I have used kallisto recently, so this is odd and I did not expected it
Not sure what is going on ...
Anyways it seems that your issue is not due to our kallisto-align
. Take a look at your transcripts.fa file.
Sorry ... I had use kallisto recently, so did not expect the issue would be there. Apologies again
Using prepare-emase
to generate diploid transriptome using as input the SRR5125117.gtf
and SRR5125117.fa
generated with g2gtools
grep "_R" SRR5125117.gtf > SRR5125117.R.gtf
grep "_L" SRR5125117.gtf > SRR5125117.L.gtf
prepare-emase -G SRR5125117.fa,SRR5125117.fa -g SRR5125117.L.gtf,SRR5125117.R.gtf -s L,R -o test -m -x
sed -i "s/_R_R/_R/g" test/emase.pooled.transcripts.info
sed -i "s/_L_L/_L/g" test/emase.pooled.transcripts.info
sed -i "s/_R_R/_R/g" test/emase.pooled.transcripts.fa
sed -i "s/_L_L/_L/g" test/emase.pooled.transcripts.fa
~/app/kallisto_linux-v0.42.1/kallisto index -i test/emase.pooled.transcripts.k_idx test/emase.pooled.transcripts.fa
[build] loading fasta file test/emase.pooled.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 472 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 44839 contigs and contains 20829208 k-mers
quant
step~/app/kallisto_linux-v0.42.1/kallisto quant -i test/emase.pooled.transcripts.k_idx -o test_kallisto ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz
[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 29496
[index] number of k-mers: 20829208
[index] number of equivalence classes: 56014
[quant] running in paired-end mode
[quant] will process pair 1: ../../fastq/SRR5125122_1.fastq.gz
../../fastq/SRR5125122_2.fastq.gz
[quant] finding pseudoalignments for the reads ...
done
[quant] processed 23476745 reads, 12434421 reads pseudoaligned
[quant] estimated average fragment length: 156.926
[ em] quantifying the abundances ... done
[ em] the Expectation-Maximization algorithm ran for 1088 rounds
quant
outputhead test_kallisto/abundance.txt
target_id length eff_length est_counts tpm
ENSRNA049756373-T1_L 91 91 0.5 0.939
ENSRNA049756376-T1_L 86 86 1.5 2.98078
ENSRNA049756377-T1_L 101 101 0 0
ENSRNA049756378-T1_L 119 119 0 0
ENSRNA049756379-T1_L 141 141 0 0
ENSRNA049756380-T1_L 92 92 0 0
ENSRNA049756381-T1_L 103 103 0 0
ENSRNA049756382-T1_L 164 8.07353 0 0
ENSRNA049756383-T1_L 155 155 23 25.3591
kallisto-align
~/app/kallisto-align/kallisto-align -i test/emase.pooled.transcripts.k_idx -f ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate%
kallisto
distributed with kallisto-align
/home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index -i emase.pooled.transcripts.kOld_index emase.pooled.transcripts.fa
[build] loading fasta file emase.pooled.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 472 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 44839 contigs and contains 20829208 k-mers
$/home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto quant -i emase.pooled.transcripts.kOld_index -o k_old ../../../fastq/SRR5125122_1.fastq.gz ../../../fastq/SRR5125122_2.fastq.gz
[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 29496
[index] number of k-mers: 20829208
[index] number of equivalence classes: 56014
[quant] running in paired-end mode
[quant] will process pair 1: ../../../fastq/SRR5125122_1.fastq.gz
../../../fastq/SRR5125122_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 23476745 reads, 12434421 reads pseudoaligned
[quant] estimated average fragment length: 156.926
[ em] quantifying the abundances ... done
[ em] the Expectation-Maximization algorithm ran for 1088 rounds
$head k_old/abundance.txt
target_id length eff_length est_counts tpm
ENSRNA049756373-T1_L 91 91 0.5 0.939
ENSRNA049756376-T1_L 86 86 1.5 2.98078
ENSRNA049756377-T1_L 101 101 0 0
ENSRNA049756378-T1_L 119 119 0 0
ENSRNA049756379-T1_L 141 141 0 0
ENSRNA049756380-T1_L 92 92 0 0
ENSRNA049756381-T1_L 103 103 0 0
ENSRNA049756382-T1_L 164 8.07353 0 0
ENSRNA049756383-T1_L 155 155 23 25.3591
kallisto-align
with the new index~/app/kallisto-align/kallisto-align -i test/emase.pooled.transcripts.kOld_index -f ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate%
I apologise again for whatever shambles or mistakes I did previously. kallisto
is working fine, as expected I guess and as I commented I had used it before.
Both the kallisto
you distribute with kallisto-align
and the one I downloaded are v0.42.1
I am happy to send along or upload somewhere the transcriptome and fastq files if that helps to work out that is going on ...
Thanks again for your help on this!
That error message is coming from kallisto
and literally saying your index does not match the version for some reason. Try to build kallisto index using /home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index
.
And the following does not look right because you are providing a same fasta file for L and R. Usually you should provide L.fa and R.fa. Is SRR5125117.fa
diploid genome you created with g2gtools
?
$ prepare-emase -G SRR5125117.fa,SRR5125117.fa -g SRR5125117.L.gtf,SRR5125117.R.gtf -s L,R -o test -m -x
If SRR5125117.fa
is diploid, I think you should be able to simply do the following.
$ prepare-emase -G SRR5125117.fa -g SRR5125117.gtf -o test -m -x
Hi,
On the example above i did build the index with /home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index
see number 6 on the message above.
Regarding prepare-emase
, yes SRR5125117.fa
is the diploid genome generaed by g2gtools
. I just tried to replicate the emase
protocol which has separate files for each haplotype.
Here is the test. It does not seem to make a difference
$ prepare-emase -G SRR5125117.fa -g SRR5125117.gtf -o test2 -m -x
$ /home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index -i test2/emase.transcripts.k_idx test2/emase.transcripts.fa
[build] loading fasta file test2/emase.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 472 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 44835 contigs and contains 20829133 k-mers
$ ~/app/kallisto-align/kallisto-align -i test2/emase.transcripts.k_idx -f ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate%
Previously you say kallisto
currently uses index version 9 and kallisto-align
uses version 8. However, the message says it expects version 0 of the index. Is that correct?
If I send you the transcriptome index, would you try to replicate the error?
Many thanks again
quick question. What does kallisto-align
actually do? If I run kallisto
generate a pseudobam file and convert it into a emase-binary format with alntools, would that replace kallisto-align
?
You are right, you can convert kallisto
pseudobam into emase binary file and run emase-zero
. But kallisto-align
does it way faster. The kallisto
that we carry should not create Version 9 index.
Let me know if there is anything I can do to hlep debug this. i will try the long side-path to test the g2gtools
+ emase-zero
pipeline.
Thanks a lot again and sorry for the initial confusion
Hi, Any updates on this issue? would love to use kallisto-align.
Regarding:
You are right, you can convert
kallisto
pseudobam into emase binary file and runemase-zero
. Butkallisto-align
does it way faster. Thekallisto
that we carry should not create Version 9 index.
What would be the equivalent steps: kallisto [fastq -> pseudobam] => alntools [pseudobam -> bin-emase] => emase-zero [awesome results]
Do you do local alignment of the reads to the transcripts? I understand the pseudobam does not really align to the read but rather assign it to the read and make up a cigar string. Perhaps really the question is whether emase-zero
needs alignments or it can do with read-trancript assignment?
Thanks in advance
Hi I am getting the following error I build the index with the kallisto provided with kallisto-align and also installing it with conda on a separate environment. On both cases a get the following erro.
Many thanks in advance