DallasThomas / SACCHARIS

Improve functional predictions of uncharacterized sequences for any CAZyme or CBM family
6 stars 4 forks source link

Wide character in print at cazy_extract.pl line 241 #5

Closed cmorganl closed 5 years ago

cmorganl commented 5 years ago

Hi Dallas,

Sorry to be on here again - it looks like the new CAZy release has buggered up the cazy extract script once again. I used the command for f in GH17 GH19 GH125 GT9 GT28 GT41; do cazy_extract.pl --family $f --group all; echo $f; done. It looks like it quits after issuing the wide character in print error so I don't get as many sequences as I should. Can you reproduce this?

Also getting these warnings for only GH125:

Use of uninitialized value in split at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 238.
Use of uninitialized value $l[0] in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 239.
Use of uninitialized value in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 239.
Use of uninitialized value in split at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 238.
Use of uninitialized value $l[0] in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 239.
Use of uninitialized value in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 239.
Use of uninitialized value in split at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 238.
Use of uninitialized value $l[0] in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 239.
Use of uninitialized value in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 239.
Use of uninitialized value in split at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 238.
Use of uninitialized value $l[0] in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 239.
Use of uninitialized value in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 239.

Let me know if you require more info!

Thanks again! Connor

DallasThomas commented 5 years ago

Hello Connor,

Thanks for the information. I apologize for the delay in responding - I wanted to have a solution before responding. I have updated the version of cazy_extract.pl and this one the issues you brought up should be resolved in. I will not close the issue until I am sure this is the case.

So what was going on here :)

I think you brought up three issues. I will go through each of them and if there is anything I missed please let me know.

1 - The Wide Character print warning. The good thing is this is just a warning and has to do with sometimes characters not encoding properly as UTF8 - a way around this is to make sure when you are reading in or printing out you make sure you are encoding in UTF8. Before just doing so I checked to confirm if the fasta output was first being jeopardized by this warning during print. The fasta output even in the cases where this warning was showing up were properly validated. As this was the case I just made sure the encoding for UTF8 was set and retested. All is good here.

2 - Not all sequences are being retrieved. I tested this a few times with both a default run and with the Fragments flag set to false. In the default all entries that are considered fragments are removed. So the vast majority of the entries not retrieved are due to Fragment screening. In the case of GH17 with the Fragments not removed I was still 9 entries short. I have found in the past this is due to the retrieved missing accession numbers or numbers that actually do not have an NCBI reference and hence no sequence.

3 - The split and concatenation error in GH125. Well this was due to CAZy having an accession number that ended with a '.' and no trailing digit. This threw off the parsing in the script and accounted for the error. I added a special case to cover this possibility and now I have no issues with GH125 and the sequence in question is being properly retrieved.

I hope this helps. Please if I a have missed something or you find another error let me know. Once again I will not close this issue until all is working for you, so please let me know how the re-run goes.

Thanks and have a great weekend.

Dallas

cmorganl commented 5 years ago

Hi Dallas,

Thanks for the quick reply! This all makes sense. I pulled down the most recent version and re-ran GH125 but there were many more warnings than before:

Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 474.
Use of uninitialized value $count in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 478.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 482.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 485.
Use of uninitialized value $key in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 493.
Use of uninitialized value $web in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 493.
Use of uninitialized value $efetch_result in substitution (s///) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 498.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 474.
Use of uninitialized value $count in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 478.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 482.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 485.
Use of uninitialized value $key in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 493.
Use of uninitialized value $web in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 493.
Use of uninitialized value $efetch_result in substitution (s///) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 498.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 474.
Use of uninitialized value $count in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 478.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 482.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 485.
Use of uninitialized value $key in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 493.
Use of uninitialized value $web in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 493.
Use of uninitialized value $efetch_result in substitution (s///) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 498.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 474.
Use of uninitialized value $count in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 478.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 482.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 485.
Use of uninitialized value $key in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 493.
Use of uninitialized value $web in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 493.
Use of uninitialized value $efetch_result in substitution (s///) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 498.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 474.
Use of uninitialized value $count in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 478.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 482.
Use of uninitialized value $esearch_result in pattern match (m//) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 485.
Use of uninitialized value $key in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 493.
Use of uninitialized value $web in concatenation (.) or string at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 493.
Use of uninitialized value $efetch_result in substitution (s///) at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 498.
Use of uninitialized value $_ in split at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 208.
Use of uninitialized value $_ in split at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 208.
Use of uninitialized value $_ in split at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 208.
Use of uninitialized value $_ in split at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 208.
Use of uninitialized value $_ in split at /home/cmorganlang/bin/SACCHARIS/cazy_extract.pl line 208.

There are now these warnings for other families too.

DallasThomas commented 5 years ago

Hello Connor,

Can you try it again with this copy. Dumb mistake on my part - got my copy working on my machine and then instead of uploading that copy just modified the one already on github - should of just uploaded what I had the first time.

There should not be any errors now - Famous Last words. Been one of those weeks and we are only on day 2.

cmorganl commented 5 years ago

Hi Dallas,

Apologies for not responding earlier. That last push seems to have fixed it!

Thanks again!