DominikBuchner / BOLDigger3

MIT License
2 stars 0 forks source link

No hits below 94% similarity #5

Open vanessamata opened 1 week ago

vanessamata commented 1 week ago

Hi Dominik!

Thanks for quickly updating BOLDigger to support BOLD V5!

I've noticed that now there seem to be a lot of "no matches" and when checking the hits there's no hits below 94% similarity, despite having selected database 3 (animal library, public + private) and mode 3 (Exhaustive Search). This issue seems to happen on the website as well, so I guess it is not a problem of BOLDigger itself, but rather some kind of bug with their identification engine. I have no idea when this will be fixed (i have already emailed them), so I wonder if BOLDigger (perhaps v2?) could keep accessing BOLD V4 instead? I'm trying to identify things from Africa, so a lot of the hits are below 94%. I managed to get hits when going to BOLD V4 with 90-93% similarity, but get "no match" on V5.

Thanks!!

Vanessa

ps: right now the identification engine of V4 seems to be down though... at least I haven't been able to get any result.

DominikBuchner commented 1 week ago

Hi Vanessa, the BOLD V4 api is also down, so without major updates to the code this is not easily doable. It might be possible to unlock different search parameters with some computer magic, I'll take a look. Until then our best bet is to contact BOLD regarding the issue. I believe it should be an easy fix for them, and also believe mode 3 and database 3 should at least go down to 85% as they state on the website.

€: Please let me know as soon as you get a response from them, this is really interesting.

vanessamata commented 1 week ago

I got a reply saying they would look into it and get back to me as soon as possible, and 20 minutes later they asked for an example of a no match and of something that would only report hits above 94%. I have provided example sequences this morning and now I am waiting for feedback :) fingers crossed that they solve the issue! :)

DominikBuchner commented 1 week ago

Perfect, thank you very much. When they fix it, BOLDigger3 will automatically adjust!

Anto007 commented 1 week ago

Hi @DominikBuchner

I've got a somewhat similar issue as reported here. Below are the results from BOLDigger3 for my COI ASVs image and below are the results from BOLDigger2 (that was generated months ago) for these same COI ASVs image Obviously, the number of "No matches" is high in the results from BOLDigger3 presumably due to the 85% similarity cutoff in mode 3 in BOLD v5. In BOLDigger2, different thresholds (97%: species level, 95%: genus level, 90%: family level, 85%: order level, <85% and >= 50: class level) for the taxonomic levels were used to find the best fitting hit but I guess going up to 50% identity to get class-level classifications is not going to be possible anymore when using BOLD v5 database?

DominikBuchner commented 1 week ago

Yes and I would not trust anything below 85% anyways, therefore I removed the lower threshold entirely. I think this is actually a good improvement by BOLD if only it was working.

Anto007 commented 1 week ago

Thanks for your response @DominikBuchner and I understand your point. However, a consequence of this will now be that a major chunk of our sequences from eDNA sequencing of poorly characterized environments is going to be reported as "Unclassified". I suppose the choice to go for a conservative approach or a liberal approach with respect to assigning taxonomy might be considered somewhat subjective and context-dependent.

DominikBuchner commented 1 week ago

I think if we actually get 85%+ soon this will be much less of an issue.

Anto007 commented 1 week ago

But my BOLDigger3 results above are from this morning and they have clearly taken hits > 85%. I didn't see a <94% problem at least in my results

DominikBuchner commented 1 week ago

Can you send the file with all results to my working mail address? I'd like to have a look, because I was able to reproduce the issue described!

Anto007 commented 1 week ago

Sorry, I'm unable to follow you. You mean to say you too got results files that did not have results <94%? My results sheet from this morning has got plenty of hits at around 85% and I had used boldigger3-1.1.2 (which I've upgraded further to 3-1.1.4 at this very minute but I'm yet to test this upgraded version)

DominikBuchner commented 1 week ago

So you get results between 0.85% and 0.94%? That would mean that they fixed it immediately

Jant007 @.***> schrieb am Do., 31. Okt. 2024, 14:56:

Sorry, I'm unable to follow you. You mean to say you too got results files that did not have results <94%? My results sheet from this morning has got plenty of hits > 85% and I had used boldigger3-1.1.2 (which I've upgraded further to 3-1.1.4 at this very minute but I'm yet to test this upgraded version)

— Reply to this email directly, view it on GitHub https://github.com/DominikBuchner/BOLDigger3/issues/5#issuecomment-2449905129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJH6ILGE6BIYPLO75LTTKATZ6IZJ7AVCNFSM6AAAAABQ4TWAZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBZHEYDKMJSHE . You are receiving this because you were mentioned.Message ID: @.***>

vanessamata commented 1 week ago

perhaps they fixed it already? i will try again to see if is solved!

Anto007 commented 1 week ago

Yes, I didn't notice a <94% identity problem in my BOLDigger3 results from this morning's run (for example, see the 88.854% identity result for my ASV_9 in my earlier posted BOLDigger3 screenshot in this thread). I didn't get a chance to run BOLDigger3 yet again this day and so don't know if something broke at the BOLD server end for example later in the afternoon.

DominikBuchner commented 1 week ago

Hi all, short update on that matter:

TLDR: No hits below 0.94% for now, I'm in contact with BOLD.

DominikBuchner commented 1 week ago

Update: BOLD is aware of the problem and actively working to fix it.

Anto007 commented 1 week ago

@DominikBuchner That's strange- perhaps, I got lucky then. Here are my input ASVs fasta file and the identification results that I got from BOLDigger3. Test_ASVs_identification_result.xlsx Test_ASVs.fa.txt There are 7033 ASV sequences in total and you'll notice that many of them indeed are <90% identity hits. You seem to have closed this issue but have the BOLD admins communicated to you that that they have fixed this issue? Just curious

DominikBuchner commented 1 week ago

Not fixed yet, but it seems to be a minor issue. What was the operating mode and db for your fasta?

Anto007 commented 1 week ago

--db 3 and --mode 3. The total run time was an impressive 3 hours.

DominikBuchner commented 1 week ago

This is really strange, I can confirm that with your .fasta I can also get results >85 > 94% but was not able to reproduce this with any other file! Can you tell me the operating system this file was produced on?

Anto007 commented 1 week ago

@DominikBuchner It was generated on an Ubuntu 20.04 LTS OS after running dada2 and some final ASV filtering steps such as removal of NuMTs, non-Eukaryota ASVs and so on.

DominikBuchner commented 1 week ago

Hm okay, I officially don't get it. Let's see if I get a positive response from BOLD, so far, it does not seem to be fixed (except for Anto's file :D)

Anto007 commented 1 week ago

Very odd..What if perhaps you made a new hybrid test fasta file containing, for example, 10 of my sequences and 10 of yours?

DominikBuchner commented 1 week ago

Maybe a good idea. Will test tomorrow.

vanessamata commented 1 week ago

interesting... I've tried re-running singles sequences on the website and I still get the same issue, no matches or only matches >94%. It's very odd that for a specific fasta files it does provide results >85%... I am very confused...! I haven't had any feedback from BOLD since I provided example sequences :(

DominikBuchner commented 1 week ago

So I got feedback from BOLD: They say the website is working as intended and there is no bug. I'm as confused as you are, but will perform further tests. They will publish an API around January/February which will speed up the whole process once implemented into BOLDigger3. I'll keep you updated, the other bug I reported was resolved today, so no more "unavailable" process IDs. I believe that it has sth. to do with the formatting of Anto's file and will do some bug-testing with the same data in different formats. Will keep you updated.

vanessamata commented 1 week ago

their website is working so well that I have been waiting for over 10 minutes for a single sequence with no luck... boldigger3 also seems to be stuck. Maybe their identification engine is down?

V4 seems to be working again fine though xD would it be difficult to update boldigger2 to use the new address of V4?

DominikBuchner commented 1 week ago

I'll check the options here. I believe the old bold API is down, but will check tomorrow! Really sorry about the mess this is causing, no advertisement for genetic methods tbh.

Regarding v5: just waiting helps a lot :D

vanessamata @.***> schrieb am Mo., 4. Nov. 2024, 16:36:

their website is working so well that I have been waiting for over 10 minutes for a single sequence with no luck... boldigger3 also seems to be stuck. Maybe their identification engine is down?

V4 seems to be working again fine though xD would it be difficult to update boldigger2 to use the new address of V4?

— Reply to this email directly, view it on GitHub https://github.com/DominikBuchner/BOLDigger3/issues/5#issuecomment-2455034982, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJH6ILC2OIRVD5B7MDTUHDDZ66IAJAVCNFSM6AAAAABQ4TWAZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJVGAZTIOJYGI . You are receiving this because you were mentioned.Message ID: @.***>

DominikBuchner commented 1 week ago

It appears to be the length of the sequence. Sequences shorter than 225 bp produce this bug. I reported it and wait for a response.

Anto007 commented 1 week ago

Ohhh...weird but great to know that you were finally able to identify the bug here