TillMacher / apscale_gui

Advanced Pipeline for Simple yet Comprehensive AnaLysEs of DNA metabarcoding data
https://pypi.org/project/apscale-gui/
MIT License
6 stars 1 forks source link

NCBI BLAST and Local BLAST - Custom NCBI error #11

Open lukazuvic opened 2 months ago

lukazuvic commented 2 months ago

Dear Till, I hope you are well and that I am not bothering you too much. I am working on DNA metabarcoding of the stomach contents of Atlantic bluefin tuna by amplifying partial fragments of the COI and 18S gene regions. I am using APSCALE_GUI on Windows 11 OS for data processing, and that is why I am contacting you. To put it in a nutshell: COI data processing is going smoothly, but I have a problem with taxonomic assignment of my 18S data, i.e., with creating a local database (Local BLAST - Custom NCBI) and with the usage of NCBI BLAST. When I try to do a taxanomic assignment via NCBI BLAST, I get this error: AttributeError: 'XlsxWriter' object has no attribute 'save'. Did you mean: '_save'?, and I also get an error message from Excel saying that the file cannot be opened because the file extension is not valid. If I use Local BLAST - Custom NCBI, on the other hand, everything seems to work fine until I have to filter the blast results, where this error occurs: ValueError: Length mismatch: Expected axis has 20 elements, new values have 11 elements. In the attachment, you will find a screenshot of the problem that occurs. I have noticed that there are options in the software for Silva and PR2 databases, but I have not found instructions on how to create these local databases. Can you please help me solve this problem or write me an instructions on how to create local Silva database, that would be really helpful.

Thank you very much and best regards,

Luka
Local_BLAST_Custom_NCBI_error NCBI_blast_error2 NCBI_error1

TillMacher commented 2 months ago

Hi Luka,

I am currently working on a better solution for the local blast application. You can check it out here.

But maybe wait until tomorrow until you try it. Just now I am fixing some bugs and prepare some last changes for the final launch :)

Also, APSCALE-GUI will probably be discontinued for now, since PySimpleGUI requires a paid subscription (and I use it for work). But APSCALE will still be updated and we have made some nice features to the latest version!

cheers Till

lukazuvic commented 2 months ago

Dear Till, thank you for all the information. I tried apscale-blast and everything goes fine (I get all the output files - screenshot attached) until I have to execute apscale_blast filter comand, where I get this error: C:\Users\lukaz>apscale_blast filter

APSCALE blast command line tool - v1.0.0 - 01/07/2024
Usage examples:
$ apscale_blast blastn -h
$ apscale_blast blastn -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -query_fasta ./12S_apscale_ESVs.fasta
$ apscale_blast filter -h
$ apscale_blast filter -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -blastn_folder ./12S_apscale_ESVs_blastn

Please enter PATH to database: C:\Users\lukaz\Desktop\Midori2_apscale_blast\db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST\db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST Please enter PATH to blastn folder: C:\Users\lukaz\Desktop\COI_test2_aspcale_blast\testCOI_BFT4sl_apscale_OTUs_filtered\subsets 17:35:58: Starting to filter blast results for 'C:\Users\lukaz\Desktop\COI_test2_aspcale_blast\testCOI_BFT4sl_apscale_OTUs_filtered\subsets' 17:35:58: Your database: db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST Traceback (most recent call last): File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\Scripts\apscale_blast.exe__main.py", line 7, in File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\apscale_blast__main.py", line 659, in main blastn_filter(args.blastn_folder, args.database, thresholds, args.n_cores) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\apscale_blast\main__.py", line 538, in blastn_filter merged_df = pd.concat(df_list, ignore_index=True) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\concat.py", line 382, in concat op = _Concatenator( File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\concat.py", line 445, in init__ objs, keys = self._clean_keys_and_objs(objs, keys) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\concat.py", line 507, in _clean_keys_and_objs raise ValueError("No objects to concatenate") ValueError: No objects to concatenate

I was also wondering if you will upload the Silva SSU database to the server, and as for APSCALE-GUI, when you say it will be discontinued, that means I will not be able to use it anymore, regardless of whether I subscribe to PySimpleGUI?

Thank you and best regards, Luka

apscale_blast blastn_output apscale_blast blastn_output2 apscale_blast blastn_output3

TillMacher commented 2 months ago

Everything seems to be alright with the blastn results. Can you open the first subset file and show me the content?

Regarding your questions: Yes I will upload Silva SSU latest next week! And sadly yes, APSCALE-GUI will for now not be updated anymore. Maybe in the future. But we have also made some substantial changes to APSCALE, which in combination with PySimpleGUI makes it really difficult for me to update currently.

lukazuvic commented 2 months ago

subset_1_blastn

Here it is :)

Thanks for the answers.

TillMacher commented 2 months ago

small_COI.fasta.zip Can you try this fasta file here? For me it works with the latest version of apscale_blast! And when did you download apscale_blast? I made some updates some hours ago. Maybe also try to update it?

lukazuvic commented 2 months ago

I update it, and same error appears.

I have tired on your file, and at the end also same error appears:

C:\Users\lukaz>apscale_blast blastn

APSCALE blast command line tool - v1.0.0 - 01/07/2024
Usage examples:
$ apscale_blast blastn -h
$ apscale_blast blastn -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -query_fasta ./12S_apscale_ESVs.fasta
$ apscale_blast filter -h
$ apscale_blast filter -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -blastn_folder ./12S_apscale_ESVs_blastn

Please enter PATH to database: C:\Users\lukaz\Desktop\Midori2_apscale_blast\db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST\db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST Please enter PATH to query fasta: C:\Users\lukaz\Desktop\small_COI\small_COI.fasta\small_COI.fasta C:\Users\lukaz\Desktop\small_COI\small_COI.fasta\small_COI.fasta Traceback (most recent call last): File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\Scripts\apscale_blast.exe__main__.py", line 7, in File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\apscale_blast__main__.py", line 631, in main os.mkdir(Path(args.out)) FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\Users\lukaz\Desktop\small_COI\small_COI\small_COI'

C:\Users\lukaz>apscale_blast blastn

APSCALE blast command line tool - v1.0.0 - 01/07/2024
Usage examples:
$ apscale_blast blastn -h
$ apscale_blast blastn -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -query_fasta ./12S_apscale_ESVs.fasta
$ apscale_blast filter -h
$ apscale_blast filter -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -blastn_folder ./12S_apscale_ESVs_blastn

Please enter PATH to database: C:\Users\lukaz\Desktop\Midori2_apscale_blast\db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST\db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST Please enter PATH to query fasta: C:\Users\lukaz\Desktop\Small_COI\small_COI.fasta C:\Users\lukaz\Desktop\Small_COI\small_COI.fasta 18:26:42 : Creating subset(s) from fasta file. 18:26:42 : Created 1 subset(s) from fasta file. 18:26:42: Starting blastn for 'small_COI' 18:26:42: Your database: db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST 18:30:20: Finished blastn for subset 1/1. 18:30:20: Finished blastn for 'small_COI'

C:\Users\lukaz>apscale_blast filter

APSCALE blast command line tool - v1.0.0 - 01/07/2024
Usage examples:
$ apscale_blast blastn -h
$ apscale_blast blastn -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -query_fasta ./12S_apscale_ESVs.fasta
$ apscale_blast filter -h
$ apscale_blast filter -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -blastn_folder ./12S_apscale_ESVs_blastn

Please enter PATH to database: C:\Users\lukaz\Desktop\Midori2_apscale_blast\db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST\db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST Please enter PATH to blastn folder: C:\Users\lukaz\Desktop\Small_COI\small_COI\subsets 18:32:32: Starting to filter blast results for 'C:\Users\lukaz\Desktop\Small_COI\small_COI\subsets' 18:32:32: Your database: db_MIDORI2_UNIQ_NUC_SP_GB260_CO1_BLAST Traceback (most recent call last): File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\Scripts\apscale_blast.exe__main.py", line 7, in File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\apscale_blast__main.py", line 659, in main blastn_filter(args.blastn_folder, args.database, thresholds, args.n_cores) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\apscale_blast\main__.py", line 541, in blastn_filter merged_df = pd.concat(df_list, ignore_index=True) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\concat.py", line 382, in concat op = _Concatenator( File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\concat.py", line 445, in init__ objs, keys = self._clean_keys_and_objs(objs, keys) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\reshape\concat.py", line 507, in _clean_keys_and_objs raise ValueError("No objects to concatenate") ValueError: No objects to concatenate

TillMacher commented 2 months ago

Ok, I think I see the error: Instead of: C:\Users\lukaz\Desktop\Small_COI\small_COI\subsets Type: C:\Users\lukaz\Desktop\Small_COI\small_COI

You need to select the parent folder of "subsets"

lukazuvic commented 2 months ago

Yes, it works now. Thank you for your time and help. filter_output

I am looking forward for Silva SSU database :).

And one more question, will TTT still be available? Cheers Luka

TillMacher commented 2 months ago

I'll let you know when the SSU database will be available :)

Yes, I will update TTT with a new GUI. However, this will still take some time. But it will be much better than before!

Note: Remember to rename the first column in the taxonomy table to "ID" if you want to import it into the current TTT version. The "unique ID" column is already an update for TTT2.0

TillMacher commented 1 month ago

Hey together,

I made an update to apscale-blast and also released new versions of the databases, including the Silva SSU database. The taxonomy used there is a bit wonky, so please let me know if issues occur.

cheers Till

onurdogan commented 1 month ago

Issue with pip install apscale-blast

Heya Luka and Till,

When I attempted to run the installation commands for apscale-blast 1.0.1, I encountered an issue with the pip install apscale-blast command.

After checking the installation directory, I noticed that the package was installed under the name apscale_blast (with an underscore, not a hyphen).

The correct installation command for me was:

pip install apscale_blast

Screenshot by Dropbox Capture

Additionally, I've had an issue where the modules, as in cutadapt, don't work after downloading them for the first time. To resolve this, I copy the module from the /Library/Frameworks/Python.framework/Versions/3.10/bin directory and paste it into the /usr/local/bin folder, which makes it work as expected.

from here Screenshot by Dropbox Capture

to here Screenshot by Dropbox Capture

TillMacher commented 1 month ago

Hey guys,

it should be possible to download apscale-blast now using both "pip3 install apscale-blast" and "pip3 install apscale_blast". Pypi keeps showing the first command, but it works anyways.

That's weird. I never had this installation problem in MacOS. Good that you found a solution!

And maybe let's move to the issues section in apscale-blast for new discussions, so that people can find them in the correct place :)

cheers Till

lukazuvic commented 1 month ago

Hey Till, thank you for the information. I tried to do a taxonomic assignment using the Silva SSU database, and I get this error when I try to filter blastn results:

C:\Users\lukaz>apscale_blast blastn

APSCALE blast command line tool - v1.0.0 - 01/07/2024
Usage examples:
$ apscale_blast blastn -h
$ apscale_blast blastn -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -query_fasta ./12S_apscale_ESVs.fasta
$ apscale_blast filter -h
$ apscale_blast filter -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -blastn_folder ./12S_apscale_ESVs_blastn

Please enter PATH to database: C:\Users\lukaz\Desktop\SILVA_apscale_blast\db_SILVA_138.2_SSURef_NR99_tax_silva Please enter PATH to query fasta: C:\Users\lukaz\Desktop\18S\test18S_BFT_apscale_OTUs_filtered.fasta C:\Users\lukaz\Desktop\18S\test18S_BFT_apscale_OTUs_filtered.fasta 09:44:03 : Creating subset(s) from fasta file. 09:44:03 : Created 1 subset(s) from fasta file. 09:44:03: Starting blastn for 'test18S_BFT_apscale_OTUs_filtered' 09:44:03: Your database: db_SILVA_138 09:45:22: Finished blastn for subset 1/1. 09:45:22: Finished blastn for 'test18S_BFT_apscale_OTUs_filtered'

C:\Users\lukaz>apscale_blast filter

APSCALE blast command line tool - v1.0.0 - 01/07/2024
Usage examples:
$ apscale_blast blastn -h
$ apscale_blast blastn -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -query_fasta ./12S_apscale_ESVs.fasta
$ apscale_blast filter -h
$ apscale_blast filter -database ./MIDORI2_UNIQ_NUC_GB259_srRNA_BLAST -blastn_folder ./12S_apscale_ESVs_blastn

Please enter PATH to database: C:\Users\lukaz\Desktop\SILVA_apscale_blast\db_SILVA_138.2_SSURef_NR99_tax_silva Please enter PATH to blastn folder: C:\Users\lukaz\Desktop\18S\test18S_BFT_apscale_OTUs_filtered 09:46:53: Starting to filter blast results for 'C:\Users\lukaz\Desktop\18S\test18S_BFT_apscale_OTUs_filtered' 09:46:53: Your database: db_SILVA_138 joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib_utils.py", line 72, in call return self.func(kwargs) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 598, in call return [func(*args, *kwargs) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 598, in return [func(args, kwargs) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\apscale_blast__main__.py", line 148, in filter_blastn_csvs df_2 = accession2taxonomy(df_1, taxid_dict, col_names_2, db_name) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\apscale_blast__main__.py", line 77, in accession2taxonomy accession = row[1].split('|')[1] + '.1' IndexError: list index out of range """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\Scripts\apscale_blast.exe__main.py", line 7, in File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\apscale_blast__main.py", line 659, in main blastn_filter(args.blastn_folder, args.database, thresholds, args.n_cores) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\apscale_blast\main__.py", line 522, in blastn_filter Parallel(n_jobs = n_cores, backend='threading')(delayed(filter_blastn_csvs)(file, taxid_dict, i, n_subsets, thresholds, db_name) for i, file in enumerate(csv_files)) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 2007, in call__ return output if self.return_generator else list(output) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 1650, in _get_outputs yield from self._retrieve() File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 1754, in _retrieve self._raise_error_fast() File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 1789, in _raise_error_fast error_job.get_result(self.timeout) File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 745, in get_result return self._return_or_raise() File "C:\Users\lukaz\AppData\Local\Programs\Python\Python310\lib\site-packages\joblib\parallel.py", line 763, in _return_or_raise raise self._result IndexError: list index out of range

I also noticed that there is a difference between 18S and COI blastn CSV tables, so I thought this might be causing an error.

18S_blastn COI_blastn

Cheers Luka

TillMacher commented 1 month ago

You used version 1.0.0 - please upgrade apscale-blast to 1.0.2 and then it should work!

lukazuvic commented 1 month ago

Ahhh yes, sorry I did not notice. It worked, but there is a misclassification at the taxonomic level, e.g. in the Species column the software sets Metazoa:

18S_apscale_blast

TillMacher commented 1 month ago

Ok, I will look into the taxonomy. The problem is that, unlike the other databases, the taxonomy is not really well structured in the fasta header. Do you already have experience with Silva, because I haven't worked with this database yet?

lukazuvic commented 1 month ago

Thank you. Unfortunately my first contact with this database was when I contacted you about a problem with taxonomic assignment when I was using apscale_gui local blast.

TillMacher commented 1 month ago

I uploaded updated version of the Silva databases. However, the taxonomy is not really well structured and often different levels are provided. However, at least the superkingdom and the species as well as genus should be correct now. I will work on this for future updates, but currently this is the best way to implement the taxonomy into the Apscale taxonomic levels.

lukazuvic commented 1 month ago

Thank you for the update, I have tested the updated database and yes, species and genus levels are now correct as you mentioned. Thank you for all your hard work, it really means a lot.

18S

Cheers, Luka

lukazuvic commented 1 month ago

Hi Till, sorry to bother you, please can you inform me when you make any further changes to the taxonomy of the Silva database?

Many thanks and cheers, Luka