Open jolbi opened 3 months ago
As a starting point I would suggest simplifying the headers in the fasta file used to create your blast DB and redo your blast e.g. seqkit -i
I simplified fasta headers to only include seq ID (e.g. Q8RWX4) and checked for potential duplicates. I created a new BLAST DB, rerun BLAST and the error stays the same. For some reason, the target IDs in the tsv file include string sp|<actual_id>|
.
Because of this I also tried to use IDs with a prefix (e.g. uniprot_Q8RWX4) and the IDs in the tsv now match fasta headers, but the error stays the same.
What else can I try? I'm running out of ideas.
Should I rerun some previous stages of mikado after creating new BLAST results?
I did not rerun any, only updated the path to BLAST target fasta in configuration yaml (for --json-conf
).
Edit: I am using Mikado v2.3.4
The error Cannot use a compiled regex as replacement pattern with regex=False
is a pandas error, referring to str.replace
function, used here: https://github.com/EI-CoreBioinformatics/mikado/blob/69abafef128d2441e20126159e6d480ff9c2e956/Mikado/serializers/blast_serializer/tabular_utils.py#L266-L267
From Pandas version >=2, argument regex
defaults to False. I added regex=True
to both lines and serialize now works.
My installation of Mikado installed with mamba uses pandas v2.2.2.
I'm not familiar with forking and issuing commits, so I'll leave this to you :)
Hi,
When running
mikado serialise
with BLAST tsv results I get error:Cannot use a compiled regex as replacement pattern with regex=False
Serialise log:
Command used:
First 20 lines of $out_dir/mikado_prepared.blast.tsv:
I tried using
--force
and--blast_targets $target_fasta
and changing--tsv
to--xml
and the error stays the same. I am confused about--blast_targets
flag. When is it needed?Blast tsv was constructed with:
I could find only this similar issue that mentions regular expressions when importing BLAST data: https://github.com/EI-CoreBioinformatics/mikado/issues/392 Should I try to rerun
makeblastdb
without-parse_seqids
? It was also mentioned in another issue that some combination of blast results columns have to be unique. But the error there was different... I did not do any filtering of BLAST results.Any ideas, what is causing the problem?