Closed Tocci89 closed 6 years ago
Though that is definitely not an intended error message, did you happen to leave on the --fasta
when passing the reference sequence?
Eg. Before MYREF.fasta
python /my/path/to/sim3C/sim3C.py -C gzip -m hic -e MboI -l 125 -r 23 -n 232709663 --dist uniform --machine-profile HiSeq2500L125 --fasta MYREF.fasta OUTPUT_sim_reads.fastq.gz
Please let me know how you make out.
If I add --fasta I get: sim3C.py: error: unrecognized arguments: --fasta
Anyway, what bothers me is the "Error: 'n' " issue which I don't understand what it is referring to.
Sorry my mistake, the UX change I was referring to is within an experimental branch (not intended for use at the moment). Not that I expected it to fix that error. Never answer bug reports late in the evening.
In testing this myself with your exact command line, I do not get an error but haven't the benefit of your reference data. Can you send me the file or post it perhaps to Zenodo
I have just pushed a small update to the master branch which has a small improvement to error handling when reading the reference sequence file. The commit also includes the option to print a trace of the exception.
Could you try running Sim3C again and also add the --debug
option.
You might get a better error message now, but the trace would help me.
I have a suspicion the error you are experiencing is originating within Bio.SeqIO.
Thank you for the help. I think I've solved the mystery... I tried running with other fasta and multifasta files and the script works. So the problem was in my input fasta and I found out that the only difference was that letters in my fasta were all lowercase. After converting in uppercase everything worked well. I was confident that the translator in Art.py could work in both cases, but maybe lowercase "n" are causing troubles. Anyway, is working fine now! Thanks again!!!
Ok, thank you for the report. That is a surprising defect!
This error is actually due to non-ACGT charcters and not case related. Thanks for bringing my attention to it.
Hi, I want to simulate Hi-C reads from my fasta reference. I set all the options and run the command as reported below: python /my/path/to/sim3C/sim3C.py -C gzip -m hic -e MboI -l 125 -r 23 -n 232709663 --dist uniform --machine-profile HiSeq2500L125 MYREF.fasta OUTPUT_sim_reads.fastq.gz
Here's what I get:
Warning: no reference supplied, calls will have to supply a template Starting sequencing simulation Library method: hic Progress: 0%| | 62/232709663 [00:00<211:14:51, 306.00it/s] Error: 'n'
I'm running sim3C.py with python 2.7; the fasta file has a .fai index.
Thanks in advance for your help