fmfi-compbio / warpstr

Determining tandem repeat lengths using raw nanopore signals.
https://fmfi-compbio.github.io/warpstr/
Other
11 stars 1 forks source link

Error Running Test Data #6

Closed frankmartiniv closed 11 months ago

frankmartiniv commented 11 months ago

Hello, I am new to programming, and when trouble shooting the following error, I was unsure whether or not it was due to my input fai file or the program. Any guidance would be appreciated.

Screenshot 2023-07-26 133505

xsitarcik commented 11 months ago

Hello, please, check your input reference FASTA file. Are there chromosomes listed as chr#, i.e. chr4 ? Some references list chromosomes simply as # without the chr prefix, check if that is your case. Then, adjust the config file so the loci coordinates match the chromosome naming in your reference.

Simply open the config file and locate the loci->coord element, you used the template config so it should be looking like this:

...
loci:
  - name: Human_STR_1108232
    coord: chr4:183178378-183178421
...

Adjust the coord element appropriately, in case of listing chromosomes as # replace chr4 with just 4, so it would like this:

...
loci:
  - name: Human_STR_1108232
    coord: 4:183178378-183178421
...
frankmartiniv commented 11 months ago

Just replacing chr4 with 4 was unsuccessful...here is what my fasta reference file lists for chromosome four though. Screenshot 2023-07-27 110610 If this is incompatible, what is the recommended reference fasta file?

xsitarcik commented 11 months ago

Sorry, I did not realize that you were running the test data and not your own data. The test data was produced with the human genome reference with chr# naming, so the reference with the same chromosome naming must be provided in the configuration file. This is different as in your own experiments, this implicitly holds, i.e. you set in the config file the same reference that was previously used to produce the input .bam files.

Thus, to run the test case successfully, revert your change of chr4 to 4 in the config file, and provide the reference file with the chr# naming. For example you can use any .fna file from NCBI. For this test use case it does not matter, which one you downloaded, just the chromosome naming must be chr# (this can be easily checked by checking .fai file).

In your own experiment, you should use the same reference onto which you mapped your reads. Also, genomic coordinates for your loci should be adjusted accordingly, so the algorithm can properly locate the repeating sequence and its flanking sequences.