bcgsc / NanoSim

Nanopore sequence read simulator
Other
246 stars 57 forks source link

No such file or directory: 'training_ht_length.pkl' #215

Closed kubracelikbas closed 3 months ago

kubracelikbas commented 3 months ago

I'm trying to run the read simulator in the perfect mode with the following code:

simulator.py genome -rg simulated/simulated.fasta -k 6 -b guppy -s 1 --perfect --fastq

But it generates an error that I cannot understand the reason:

"Traceback (most recent call last): File "/miniconda3/envs/nanosim/bin/simulator.py", line 2400, in main() File "/miniconda3/envs/nanosim/bin/simulator.py", line 2161, in main read_profile(ref_g, number, model_prefix, perfect, args.mode, strandness, dna_type=dna_type, chimeric=chimeric) File "miniconda3/envs/nanosim/bin/simulator.py", line 523, in read_profile kde_ht = joblib.load(model_prefix + "_ht_length.pkl") File "/miniconda3/envs/nanosim/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 579, in load with open(filename, 'rb') as f: FileNotFoundError: [Errno 2] No such file or directory: 'training_ht_length.pkl'"

I will appreciate any help,

Thank u.

lcoombe commented 3 months ago

Hi @kubracelikbas,

The error indicates that NanoSim cannot find the error profile files from the characterization step of NanoSim. Check where those files are located on your system, and use -c to indicate their location and prefix. You can also use one of the pre-trained models provided here on our GitHub repo.

Thank you for your interest in NanoSim! Lauren

kubracelikbas commented 3 months ago

Hi @lcoombe,

Thanks for the answer. I want to simulate nanopore reads with structural variants embedded. I've created a such a simulated reference genome, now I want to convert it into a fastq file. Do I need to use a pre-trained model in that case, if so which one should I choose?

Thanks a lot for the help!

Kübra

lcoombe commented 3 months ago

Hi @kubracelikbas,

The pre-trained models are available here: https://github.com/bcgsc/NanoSim/tree/master/pre-trained_models

You can choose the one that is most suited to the type of ONT reads that you want to simulate (ie. DNA or cDNA, which base-caller). Our newest model for genomic reads is the human_giab_hg002, which was trained using dorado base-called reads. Once you have downloaded the model you want to use, make sure you uncompress and untar the file before pointing the -c parameter to the path to the files, including the file prefix.

Hope that helps! Lauren

SaberHQ commented 3 months ago

Thanks for using NanoSim @kubracelikbas and thank you @lcoombe for the comments.

I would also suggest reading the comprehensive readme file which explains how to run NanoSim and provides detailed information on input and output files.

With that being said, I am closing this issue. Please feel free to reopen it or ask other questions and we will be happy to help you.

Best, Saber.