Closed artsiomkaltovich closed 4 years ago
If you have the pre-trained models at hand, you just need to run simulator.py
and specify the prefix of pre-trained model using -c
option.
Hello @cheny19
I've tried
bin/simulator.py transcriptome -e ~/projects/ngs-analysis data/human_NA12878_dDNA_Bham1_guppy/expression_abundance.tsv -c human_NA12878_dRNA -o ~/projects/ngs-analysis/data/human-drna
And it failed because -rt param wasn't specified, what should be specified here? BTW is -e option necessary in such case?
-rt
is the reference transcriptome that you want to simulate, and -e
is the abundance profile which is also included in the pre-trained model zip file. If you want to simulate a transcriptome with different expression levels, you can modify that file. But if you want to simulate a different species, you'll need to create your own abundance profile with the transcript from that species. Please also refer to the README.md and help message in the tool for the usage.
Hello.
It still isn't clear. Could you specify a command the one should use?
Should I specify both -rt and -rg with the same gff3 file from pretrained model?
It depends on what transcriptome you want to simulate. If you want to simulate the same species, you can specify the same gff3 as in the pre-trained models. But -rt
is used to specify transcriptome, not annotation file (gff3 file). A transcriptome should be a fasta file which you can download from Ensembl or RefSeq or UCSC, and there may be cdna
or similar words in the name. We would suggest you download the latest version built on latest genome assembly, because these files are manually curated and updated regularly. If you still have trouble, you can show me the link you find, and we can double check for you.
We would suggest you download the latest version built on latest genome assembly
Ok, I thought the same version of reference as used for model training is required.
Thank you, I will try.
Not really, but your reference transcriptome (fasta) file has to match your annotation file (gtf) and abundance profile in the simulation stage. So if you use another version of fasta file, you may need to adjust your abundance profile, aka the transcripts in it, so NanoSim is able to find all corresponding transcript in the fasta file and simulate.
Hello again)
So could you specify a link where the one can download references for human_NA12878_dRNA_Bham1_guppy.tar.gz?
Also when I try to run the following command:
simulator.py transcriptome -rg GRCh38.primary_assembly.genome.fa -rt gencode.v32.transcripts.fa -e expression_abundance.tsv -o ~/project/isoquant/data/
It is failing with.
Traceback (most recent call last):
File "/home/akaltovich/miniconda3/envs/nanosim/bin/simulator.py", line 1513, in <module>
main()
File "/home/akaltovich/miniconda3/envs/nanosim/bin/simulator.py", line 1503, in main
read_profile(ref_g, ref_t, number, model_prefix, perfect, args.mode, strandness, exp, model_ir, "linear")
File "/home/akaltovich/miniconda3/envs/nanosim/bin/simulator.py", line 397, in read_profile
with open(model_prefix + "_match_markov_model", 'r') as mm_profile:
FileNotFoundError: [Errno 2] No such file or directory: 'training_match_markov_model'
Was that file missed in the archive?
Since it is a human dataset, you can use Ensembl ftp site to download the reference genome and transcriptome.
The reason why your command is you did not specify the prefix of your pre-trained model. You need to download those model, and since they are gzipped tar balls, you need to extract them, and use -c
to specify the prefix of the pre-trained models (i.e. the common string shared among almost all the profile files)
Hello.
I am new in Nanosim tools, so sorry for possible stupid question, but I can get how to use pretrained models, what I should specify as
-rt
arg?Thank you.