bg7 / BG7

bacterial genome annotation system
bg7.ohnosequences.com
13 stars 7 forks source link

Some fixes I had to implement on the latest BG7 version. #41

Open mcoimbra opened 10 years ago

mcoimbra commented 10 years ago

Dear developers,

Whilst attempting to use your software, I found a number of problems which I fixed manually. Perhaps it may be of some help.

The first problem I encountered was in the bin/bg7 shell script on the blastn and tblastn validation lines. My ncbi-blast install is version 2.2.28+ and i found that the generated .xml output files all had a blank line at the bottom. Therefore, the validation lines:

Line 241: rnaBlastOk=$(tail -n 1 ${rnasVsContigsOutputPath} | grep -c '') Line 279: proteinBlastOk=tail -n 1 ${proteinsVsContigsOutPath} | grep -c '</BlastOutput>'

Had to be changed to:

Line 241: rnaBlastOk=$(tail -n 2 ${rnasVsContigsOutputPath} | grep -c '') Line 279: proteinBlastOk=tail -n 2 ${proteinsVsContigsOutPath} | grep -c '</BlastOutput>'

The -n argument of the "tail" program had to change from 1 to 2 to skip that newline at the end.

The second problem I found is within the directory structure of BG7. The BG7 jar file is jars/BG7.jar. However, the bg7 shell script looks for it as "jar/bg7.jar". I had to change this:

cp $BG7_HOME/jar/bg7.jar $output_folder/ echo "running bg7 now!" java -d64 -Xmx6G -Xms1G -jar $output_folder/bg7.jar rm -f $output_folder/bg7.jar

To this:

cp $BG7_HOME/jars/BG7.jar $output_folder/ echo "running bg7 now!" java -d64 -Xmx6G -Xms1G -jar $output_folder/BG7.jar rm -f $output_folder/BG7.jar

To correct for the jar directory name mismatch and the jar's case.

Finally, the template execution file for the PredictGenes task was lacking a parameter, which I added as the default DIF_SPAN value (indicated as 30 in the PredictGenes.java file). To do this, I changed in the bg7 script the following lines:

<class_full_name>com.era7.bioinfo.annotation.PredictGenes</class_full_name>
<arguments>
  <argument>${name}_proteins_tBLASTn.xml</argument>
  <argument>${name}_sequences.fna</argument>
  <argument>${name}_PredictedGenes.xml</argument>
  <argument>400</argument>
  <argument>true</argument>
</arguments>

To:

<class_full_name>com.era7.bioinfo.annotation.PredictGenes</class_full_name>
<arguments>
  <argument>${name}_proteins_tBLASTn.xml</argument>
  <argument>${name}_sequences.fna</argument>
  <argument>${name}_PredictedGenes.xml</argument>
  <argument>400</argument>
  <argument>true</argument>
  <argument>30</argument>
</arguments>

So now it takes in account the last argument when invoking the BG7.jar in the last part of the script. I also changed the "executionsTemplate.xml" file that comes with in the BG7 directory to add the same parameter line on the same place. So this:

<execution>
    <class_full_name>com.era7.bioinfo.annotation.PredictGenes</class_full_name>
    <arguments>
        <argument>XX_proteins_tBLASTn.xml</argument>
        <argument>XX_sequences_header_fixed.fna</argument>
        <argument>XX_PredictedGenes.xml</argument>
        <argument>400</argument>
        <argument>false</argument>
    </arguments>
</execution>

Changed to this:

<execution>
    <class_full_name>com.era7.bioinfo.annotation.PredictGenes</class_full_name>
    <arguments>
        <argument>XX_proteins_tBLASTn.xml</argument>
        <argument>XX_sequences_header_fixed.fna</argument>
        <argument>XX_PredictedGenes.xml</argument>
        <argument>400</argument>
        <argument>false</argument>
        <argument>30</argument>
    </arguments>
</execution>

Finally, after doing all of the above, I managed to get bg7 working properly. I hope this is of some help to the team.

Out of curiosity, how long do you estimate for a release with these fixes, so that BG7 comes "working out of the box"?

Thanks for your time!

ehsueh commented 10 years ago

Hi mcoimbra, I came across the same problems with BG7 you had 11 months ago. Thank you very much for this post. It saved me a lot of time. :) Did you also encounter ArrayIndexOUtOfBounds in PredictGenes?