Open yeroslaviz opened 3 years ago
Hi Assa! I have just updated the script, the executable, and their description in the readme, it should work with your input table now.
The script now accepts tables with different separators: comma, tabulation, semicolon.
Just remove all the lines where GenBank accession number is absent, and in the command, add an argument --column with the position of the column where GenBank numbers are stored (counting from zero). Default is --column 1 (that is, 2nd column).
About the SSL certificate error, I do not know what caused it, but after I updated the executable the error disappeared, try it now.
Thanks for the fast changes. I have downloaded the new annotate_blast_hits
standalone and tried to run it. I have removed the three comment lines from the beginning of my tab-delimited file, which now look like that at the top:
$head magicblast.nt.noComments.txt
NB500982:283:HH3WJAFX2:1:11101:24257:1057 - 0 0 0 0 0 0 0 0 0 99 0 - - 75 - 0 - - - - - - 0
NB500982:283:HH3WJAFX2:1:11101:19494:1058 - 0 0 0 0 0 0 0 0 0 99 0 - - 75 - 0 - - - - - - 0
...
But I get this error:
Error loading Python lib '/tmp/_MEIp2FtJz/libpython3.8.so.1.0': dlopen: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /tmp/_MEIp2FtJz/libpython3.8.so.1.0)`
Do I need to install some packages?
When trying to explicitly running it with python3 I get this error
$ python3 annotate_blast_hits magicblast.nt.noComments.txt n my@mailaddress -c 1
SyntaxError: Non-UTF-8 code starting with '\xe8' in file /local/Assa/projects/Tachibana/Chad/P286/annotate_blast_hits on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
The Pyinstaller documentation says:
The executable that PyInstaller builds is not fully static, in that it still depends on the system libc. Under Linux, the ABI of GLIBC is backward compatible, but not forward compatible. So if you link against a newer GLIBC, you can't run the resulting executable on an older system.
So it seems that your system has an older version of GLIBC than the environment where I compiled the standalone (it has GNU libc 2.31).
I guess the easiest solution for you would be to install package Biopython
and run the script file annotate_blast_hits.py
instead of the standalone.
Don't forget to make it executable: chmod +x annotate_blast_hits.py
In your input table, you would also need to remove the lines where you have "-" instead of GenBank accession number: the script assumes that you have GB accession in every line.
I hope this will work for you. Best, Oleg
Thanks, I think I got one step closer to running the script :-)
Unfortunately my server have either Xenial or Bionic as with older GLIBC versions. I guess you are running Ubuntu 20?
I have done the changes, and cloned the repository to have the newest version of it. I have installed the Biopython in my conda environment and tried to run the command again.
It had connected to GeneBank, but when trying to get the annotations is throws a new error:
$ python bioinf_scripts/annotate_blast_hits.py magicblast.nt.giIDfiltered.txt n my@mailaddress -c 1
Column separator: tabulation
Fetching annotation for 9729793 accessions from GenBank...
Traceback (most recent call last):
File "bioinf_scripts/annotate_blast_hits.py", line 138, in <module>
raise err
File "bioinf_scripts/annotate_blast_hits.py", line 94, in <module>
get(ids_to_fetch, step)
File "bioinf_scripts/annotate_blast_hits.py", line 78, in get
out.write(handle.read())
TypeError: write() argument must be str, not bytes
It is something in my file again?
thanks Assa
I'm using Manjaro Linux 19.0.2 (quite old already as well)
Wow, it's not getting easier :) I currently have no clue on what causes this error - it never occurred with any of my test data. Can you try to run it with the test input to see if it works? https://raw.githubusercontent.com/Gurdhhu/bioinf_scripts/master/blastn_example_input.csv
Good Morning, sorry for the late reply. Unfortunately it gives me the same error.
$ python bioinf_scripts/annotate_blast_hits.py bioinf_scripts/blastn_example_input.csv n my@mailaddress -c 1
Column separator: comma
Fetching annotation for 150 accessions from GenBank...
Traceback (most recent call last):
File "/local/Assa/projects/Tachibana/Chad/P286/bioinf_scripts/annotate_blast_hits.py", line 138, in <module>
raise err
File "/local/Assa/projects/Tachibana/Chad/P286/bioinf_scripts/annotate_blast_hits.py", line 100, in <module>
get(ids_to_fetch, step)
File "/local/Assa/projects/Tachibana/Chad/P286/bioinf_scripts/annotate_blast_hits.py", line 78, in get
out.write(handle.read())
TypeError: write() argument must be str, not bytes
Hi again,
so I have found an indirect solution.
I created a new virtual machine specifically for that reason with Ubuntu 20.04, this one has now GLIBC_2.31
.
I have tested your newest github version. The python script throws still the same error as above, but the executable now runs you example data. I am now running my file, which is a bit bigger, but for now no errors.
So it seems there is something wrong with the python script itself. Which is strange, because I would have guessed that it is the same one within the executable.
I'll keep you up-to-date, if I get new errors.
thanks for all the help.
Assa
Hi Assa,
I'm glad that you were able to make it work in spite of having to apply indirect solutions. The standalone was compiled based on the latest script version, so it is really weird that the script and the standalone behave differently. I was not able to reproduce this TypeError on my machine, and I still have no good idea about what causes it.
I hope you will not have any more problems with the script, but if you will, don't hesitate to contact me.
Best, Oleg
Hi,
I don't know id it is the intended solution, but by adding the binary flag the python script is working now
in line 77
with open(tmpfile, "wb") as out: # Writing a temporary file that will be removed after parsing
and in line 146
with open(tmp, "rb") as f:
now it works
That's interesting - I still do not understand why on your server handle.read()
for the output of the function Entrez.efetch
with retmode='xml'
returns binary data instead of xml.
Anyway, good that you have found a solution. I hope you finally were able to annotate your huge input table.
Good morning, unfortunately it still throws some errors,
I have splitted the BLAST results in 10 subsets to be able to do it in parallel, but it now throws the following error
$ python bioinf_scripts/annotate_blast_hits.py NEB_adapter/WK56.R2.unmapped.magicblast.nt.giIDfiltered_subset08.txt n yeroslaviz@biochem.mpg.de -c 1
Column separator: tabulation
Fetching annotation for 1000000 accessions from GenBank...
Traceback (most recent call last):
File "/home/yeroslaviz/miniconda3/lib/python3.6/http/client.py", line 556, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "/home/yeroslaviz/miniconda3/lib/python3.6/http/client.py", line 523, in _read_next_chunk_size
return int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/yeroslaviz/miniconda3/lib/python3.6/http/client.py", line 573, in _readall_chunked
chunk_left = self._get_chunk_left()
File "/home/yeroslaviz/miniconda3/lib/python3.6/http/client.py", line 558, in _get_chunk_left
raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "bioinf_scripts/annotate_blast_hits.py", line 138, in <module>
raise err
File "bioinf_scripts/annotate_blast_hits.py", line 94, in <module>
get(ids_to_fetch, step)
File "bioinf_scripts/annotate_blast_hits.py", line 78, in get
out.write(handle.read())
File "/home/yeroslaviz/miniconda3/lib/python3.6/http/client.py", line 466, in read
return self._readall_chunked()
File "/home/yeroslaviz/miniconda3/lib/python3.6/http/client.py", line 580, in _readall_chunked
raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(1074161172 bytes read)
Any Ideas, what could have cause this?
thaks Assa
Hi,
I have run the magicblast tool to BLAST my fastq file. This gets me a tab-delimited output file with the fowllowing first three columns (and many more):
and (for now) looks like that
I can use
awk
to either reorder the columns or change it to acsv
file format. I just need to understand better what the input should look like.There is though a conflict between what is written in the description and your example input files, if I understand them correctly.
You are writing the accession number should be in the fourth column, but in you example input files, these numbers are in the second column. Where do I need to have them?
Another question is - do I need all the other columns? or can I just use the query and reference IDs?
I have run the executable with the output file i have now (after converting it to csv) and got the following error:
any Ideas?
thanks
Assa