RemiAllio / MitoFinder

MitoFinder: efficient automated large-scale extraction of mitogenomic data from high throughput sequencing data
86 stars 14 forks source link

Using scafSeq output for Phyluce #16

Closed nonnohasegawa closed 3 years ago

nonnohasegawa commented 3 years ago

Hello!

I'm a newbie in bioinformatics and have been recently tackling my way through phyluce https://phyluce.readthedocs.io/en/latest/tutorial-one.html#finding-uce-loci

Im trying to use match contig to probe function, in which requires the --contig input. I thought I could use scafseq files produced from mitofinder (and it suggests to here!)

however I've been getting the same error that it cannot create the database. I'm wondering if you or anyone have tried to use .scafseq files to proceed with the UCE mining using phyluce, any ideas would help!

p.s. i already tried renaming the input files and still have been getting the same error.

script:

IN_DIR="/flash/BourguignonU/Nonno/Phyluce/scafseq"
PROBE="/flash/BourguignonU/Nonno/Phyluce/uce-loci/termite-master.fasta"
phyluce_assembly_match_contigs_to_probes \
        --contigs ${IN_DIR} \
        --probes $PROBE \
        --output uce-search-results \
        --log-path log

error: 2020-11-06 16:04:13,217 - phyluce_assembly_match_contigs_to_probes - CRITICAL - Database already exists 2020-11-06 16:04:13,217 - phyluce_assembly_match_contigs_to_probes - CRITICAL - Cannot create database

RemiAllio commented 3 years ago

Hi there!

Ok, so it's a bit difficult for me to give you a solution without more information. The problem seems to come from PHYLUCE.

First, the scafseq file of MitoFinder is a symbolic link. It means that it refers to another file. Could you confirm me that the link works by doing: less /flash/BourguignonU/Nonno/Phyluce/scafseq

If you can see the file everything it's okay.

Then, the problem seems to be that PHYLUCE is unable to read the file. It may not like symbolic links. I would recommend you to copy the real file (that you can find with ls -lh /flash/BourguignonU/Nonno/Phyluce/scafseq) in the directory from which you are running PHYLUCE. Try to give the file the simplest name possible.

Tell me if it works, Cheers, Rémi

nonnohasegawa commented 3 years ago

Hello Rémi!

Thanks for such a quick response! When I head one of the scafSeq files it actually gives me back this >NODE_1_length_15408_cov_26.597310 TAAGCCAACTTTTCCACCCACTACAAAGAAAAAAGAATAATAACCCCCAAACCTGCACTG..... which to me doesn't seem like a symlink (right?). In other words, when I less the directory it does not give me the link to another file.

and ls -lh does this: -rw-r--r-- 1 nonno-hasegawa3 bourguignonuni 156M Sep 17 11:02 A1448-link-metaspades.scafSeq -rw-r--r-- 1 nonno-hasegawa3 bourguignonuni 226M Sep 17 11:02 A1449-link-metaspades.scafSeq etc...

Perhaps this is an issue?

And as you mentioned, I gave them a pretty simple name, yet still some errors. I will also go open an issue over in phyluce github to see if they know anything about using scafseq files.

Best, Nonno

RemiAllio commented 3 years ago

Hi Nonno,

I saw in PHYLUCE github that Brant Faircloth told you that the issue you had could relate to your sample names. Indeed, the - sqlite (the database) does not like dashes - in file names or really any characters other than letters or underscores.

Given that you said to B. Faircloth that it worked for you after renaming your input files, I consider that I can close this issue. Please do not hesitate to reopen it if you feel it is necessary, Cheers, Rémi