OSU-SRLab / MANTIS

Microsatellite Analysis for Normal-Tumor InStability
GNU General Public License v3.0
69 stars 27 forks source link

help: Error: Specified locus does not appear to be the starting point for kmer #22

Open drmjc opened 6 years ago

drmjc commented 6 years ago

Hi, I was hoping you could help me to resolve some errors I'm having with running mantis. I'm getting thousands of these messages (almost one per bed entry):

Error: Specified locus does not appear to be the starting point for kmer.

Following the workflow below, the intervals in the BED file started 1bp before the repeat, so I bumped them up by 1, but got the same errors. The program still seems to run fairly happily though, and with slightly different scores from the 2 BED files (both being unstable in a moderate/high TMB tumour with a suspicious germline MSH6 variant).

Is it my BED (https://gist.github.com/drmjc/d62d9705b4ad7d6909cfb7b622c9d4d6), or something else?

Thanks for looking into this, Mark

The mantis bedfile was created as per the following:

  1. A 3 column bed file targeting the coding region's microsatellites was downloaded from the mSINGS app (https://bitbucket.org/uwlabmed/msings/src/b8c10cf58cecddb1356f7e9ee1ccbfdc29759314/doc/mSINGS_TCGA.bed?at=master&fileviewer=file-view-default).
  2. Using the RepeatFinder app, a bed file was produced covering the entire genome's content of microsatellites by feeding the app the hs37d5 genome fasta file.
  3. the RepeatFinder bed was run through the included fix_RF_bed_output.py script.
  4. bedtools intersect with the 3 column bed file from step 1, was then used to narrow down the whole genome bed to include ~2700 sites in the coding region containing microsatellites. This new file remained in the required format for MANTIS.
  5. The intervals appeared to be start-1.

Code:

./RepeatFinder -i genome.fa -o genome_RepeatFinder.bed
python fix_RF_bed_output.py -i genome_RepeatFinder.bed -o genome_RepeatFinder_fixed.bed
bedtools intersect -a genome_RepeatFinder_fixed.bed -b mSINGS_TCGA.bed > hs37d5_microsatellites.bed
# fix start-1 error
awk -F $"\t" 'BEGIN {OFS=FS} {$2=$2+1; print}' hs37d5_microsatellites.bed > a
mv a hs37d5_microsatellites.bed
rbonneville commented 6 years ago

Thank you for your interest in MANTIS. I have a few questions to further narrow this down:

  1. Which version of MANTIS are you using? A similar issue was fixed in #11.
  2. Which version of Python?
  3. Which version of PySam?
drmjc commented 6 years ago

Hi,

I'll try with the newest mantis. thanks!

drmjc commented 6 years ago

How critical is using python3? it's taking me longer than expected to get the dependencies installed for py3. cheers

rbonneville commented 6 years ago

Not critical at all, MANTIS will also run with Python 2.

rbonneville commented 6 years ago

Hello @drmjc, do you have any further questions regarding this issue?

drmjc commented 6 years ago

thanks for the reminder. I will reinspect this after grants are submitted in a couple of weeks. cheers

kvaldez commented 3 years ago

Hi all, I ran into this same issue using bedtools intersect without the -wa flag, and saw in another post that @rbonneville recommended this flag. Once I used the recommended bedtools command, I stopped getting the error.

So instead of: bedtools intersect -a genome_RepeatFinder_fixed.bed -b mSINGS_TCGA.bed > hs37d5_microsatellites.bed

The command should be: bedtools intersect -a genome_RepeatFinder_fixed.bed -b mSINGS_TCGA.bed -wa > hs37d5_microsatellites.bed

kvaldez commented 3 years ago

Hopefully this helps someone. Can you put this detail in the documentation please?