Closed Zjianglin closed 3 years ago
Hi @Zjianglin , thanks for your interest in LinearPartition. LinearPartition does have an option to read multiple sequences from file, and it is listed in README:
cat SEQ_OR_FASTA_FILE | ./linearpartition [OPTIONS]
Both FASTA format and pure-sequence format are supported for input.
Recommended parameters are: beam size 100, MEA gamma 1.5 and ThreshKnot threshold 0.3 for LinearPartition-V; and beam size 100, MEA gamma 3 and ThreshKnot threshold 0.2 for LinearPartition-C. These parameters are not sensitive to sequence length, so you can use them for all.
Thanks!
Hi @LinearFold , thanks for your reply.
I tried cat SEQ_OR_FASTA_FILE | ./linearpartition [OPTIONS]
for some demo sequences. However, the linearpartion
(and linearfold
) seems process sequences "line by line". As is shown below:
$ cat demo.fa
>MT354616_UTR5
AGAUUUUCUUGCACGUGCGUGCGAUUGCUUCAGACAGCAGUAGCAGCGGCAGAGUUGGCA
GAGAGACUUACUCACGUCGACCAGUCGUGAACGUGUUGAGGAAAAGACAGCUUAGGAGAA
CAAGAGCUGGGA
>MT354615_UTR5
AGAUUUUCUUGCACGUGCGUGCGCUUGCUUCAGACAGCAAUAGCAGCGGCAGGUUUGGUG
GAGGGAAUUGCCCGCAUCAGCCAGUCGUGAACGUGUUGAGAAAAAGACAGCUUAGGAGAA
CAAGAGCUGGGG
###############################
$ cat demo.fa | linearfold -V
>MT354616_UTR5
AGAUUUUCUUGCACGUGCGUGCGAUUGCUUCAGACAGCAGUAGCAGCGGCAGAGUUGGCA
.....(((.(((.(((...(((.((((((......)))))).))))))))))))...... (-15.70)
GAGAGACUUACUCACGUCGACCAGUCGUGAACGUGUUGAGGAAAAGACAGCUUAGGAGAA
......((((..(((((..((.....))..))))).)))).................... (-9.50)
CAAGAGCUGGGA
............ (-0.00)
>MT354615_UTR5
AGAUUUUCUUGCACGUGCGUGCGCUUGCUUCAGACAGCAAUAGCAGCGGCAGGUUUGGUG
.......(((((.(((...(((..(((((......)))))..)))))))))))....... (-16.40)
GAGGGAAUUGCCCGCAUCAGCCAGUCGUGAACGUGUUGAGAAAAAGACAGCUUAGGAGAA
(.(((.....))).).........((.(((.(.((((........))))).))).))... (-8.50)
CAAGAGCUGGGG
............ (-0.00)
#############################################
cat demo.fa | linearpartition -V -M
>MT354616_UTR5
Free Energy of Ensemble: -17.32 kcal/mol
AGAUUUUCUUGCACGUGCGUGCGAUUGCUUCAGACAGCAGUAGCAGCGGCAGAGUUGGCA
.....(((.(((.(((...(((.((((((......)))))).))))))))))))......
Free Energy of Ensemble: -10.50 kcal/mol
GAGAGACUUACUCACGUCGACCAGUCGUGAACGUGUUGAGGAAAAGACAGCUUAGGAGAA
......((((..(((((..((.....))..))))).))))....................
Free Energy of Ensemble: -0.05 kcal/mol
CAAGAGCUGGGA
............
>MT354615_UTR5
Free Energy of Ensemble: -17.28 kcal/mol
AGAUUUUCUUGCACGUGCGUGCGCUUGCUUCAGACAGCAAUAGCAGCGGCAGGUUUGGUG
.......(((((.(((...(((..(((((......)))))..))))))))))).......
Free Energy of Ensemble: -9.99 kcal/mol
GAGGGAAUUGCCCGCAUCAGCCAGUCGUGAACGUGUUGAGAAAAAGACAGCUUAGGAGAA
..(((.....)))((....))...((.(((.(.((((........))))).))).))...
Free Energy of Ensemble: -0.05 kcal/mol
CAAGAGCUGGGG
............
It predicted RNA structures for sequences line by line
. But I want to known the secondary structure for the overall sequence, especially for some long sequences. Did I use the wrong options? Or, How could I get the integral structure for my sequences?
Thank you.
Thanks for your suggestions @Zjianglin , we'll update the code base to allow such input shortly.
Now fasta input is supported.
Hi, It seems
LinearParition
could only read from pipe stream. I have many sequences (some <1000bp and some vary from 5000bp to 12000bp), How can I quickly process them, such as directly read from fasta file?It seems do not have such options. (
LinearFold
has a same problem.)What's more, for sequences with much different length, what parameters should I use? (for example:
beam size
,MEA gamma
,threashold
.etc).Thank you. I'm looking forward to your reply.