diriano / ploidyNGS

Explore ploidy levels from NGS data alone
GNU General Public License v3.0
38 stars 14 forks source link

about simulatePloidyData.py #18

Closed abshah closed 2 years ago

abshah commented 2 years ago

Dear developers of ploidyNGS,

I have been trying to run the script simulatePloidyData.py to generate a chromosome with heteromorphic loci at the given heterozygosity.

Initially after encountering some trivial errors with respect to python versions (ver 2 vs ver 3), and some straight forward deprecated function usage, I encounter a bit more complex errors. Firstly, the script is unable to handle ploidy levels below 2 (eg. --ploidy 1). Secondly, I get a ValueError when I try to run the script with ploidy 2 (see below).

Please advise on how to fix these errors or possible work-arounds.

Thank you

Best regards, Abhijeet Shah

ploidy error:

python ~/Downloads/ploidyNGS/simulation/simulatePloidyData.py --genome MyChromosome.fasta --heterozygosity 0.01 --ploidy 1

Traceback (most recent call last):
  File "/Downloads/ploidyNGS/simulation/simulatePloidyData.py", line 98, in <module>
   randomDosageTable = random.randint(0,lenDosageTable-1)
  File "/miniconda3/envs/ploidy/lib/python3.10/random.py", line 370, in randint
   return self.randrange(a, b+1)
  File "/miniconda3/envs/ploidy/lib/python3.10/random.py", line 353, in randrange
   raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0, 0, 0)

ValueError:

python ~/Downloads/ploidyNGS/simulation/simulatePloidyData.py --genome MyChromosome.fasta --heterozygosity 0.01 --ploidy 2

Traceback (most recent call last):
  File "/Downloads/ploidyNGS/simulation/simulatePloidyData.py", line 102, in <module>
   GenomeAlph.remove(base)
ValueError: list.remove(x): x not in list
SantosRAC commented 2 years ago

Hi @abshah

Thanks a lot for your question!

Regarding the two runs:

  1. Our simulatePloidyData.py was designed to generate genomic sequences with ploidy higher than 1 (with bi-allelic positions). This is why it is not able to generate a genome with ploidy 1. If you want to generate a MyChromosome.fasta with some modifications ('variants'), I would rather iterate over the genome (like our script does), and output a copy that changes the original allele every ~ X bases (depending on the "heterozygosity" you expect).
  2. Regarding the error you get when running with ploidy 2: from what you reported, it looks like you have a base different from A, T, C, or G in your genome (MyChromosome.fasta). Please, let me know if your input really has anything different from the expected. Since this is possible (e.g., other letters in IUPAC standards), I am happy to adapt the code accordingly.
abshah commented 2 years ago

Hi @SantosRAC ,

Thank you for your answer.

1.After briefly thinking and discussing about the heterozygosity of haploid individuals, it maybe safe to assume heterozygosity would not matter much in the context we are investigating.

  1. Yes! it was a nominal issue. I have made the necessary changes on my fork. Merge if you think it would be helpful to others.

Best regards, Abhijeet