bioinform / longislnd

LongISLND - Long In silico Sequencing of Lengthy and Noisy Datatypes
Other
4 stars 4 forks source link

Error when running simulate.py #40

Open AsmaaSamyMohamedMahmoud opened 2 years ago

AsmaaSamyMohamedMahmoud commented 2 years ago

Hi, I am trying to simulate long reads using simulate.py script. This is the command line I used ./simulate.py --fasta /home/asamy/scratch/ensemble_ref_hg38/Homo_sapiens.GRCh38.dna.chromosome.1.fa --movie_id ONT --read_type fastq --coverage 15 --min_frag 600 --max_frag 140000 AssertionError: /project/6032807/asamy/longislnd-0.9.5/run is not a directory So, I created run directory and added an error profile in it then run again but got this error AssertionError: failed to find models in directory /project/6032807/asamy/longislnd-0.9.5/run Could you help me solve the issue ?

yunfeiguo commented 2 years ago

Hi @AsmaaSamyMohamedMahmoud /project/6032807/asamy/longislnd-0.9.5/run is the default path where longislnd looks for a model. Please create a model using sample.py first and then specify model path using --model_dir option in simulate.py.

AsmaaSamyMohamedMahmoud commented 2 years ago

Hi @yunfeiguo, Thank you for your reply. I still have a problem because I don't have alignment file as an input for sample.py. I only have a reference genome which I want to simulate LRs from it.

yunfeiguo commented 2 years ago

LongISLND's simulation relies on an error model which is built from an alignment file. If you don't have any real data to generate the alignment file, one solution is to use public data, e.g. pacbio or oxford nanopore data on E.coli. Note, the alignment file used for building error model can be based on any reference genome (does not have to be same genome used for simulation) as long as it contains enough k-mers. E.coli genome contains all possible 7-mers so its 7-mer error model can be used to simulate any other genome.

AsmaaSamyMohamedMahmoud commented 2 years ago

Thank you for your clarification.