AgBase / InterProScan

Code for building InterProScan docker container and supporting scripts
https://agbase-docs.readthedocs.io/en/latest/interproscan/intro.html
Creative Commons Zero v1.0 Universal
2 stars 1 forks source link

Missing opening tag in merged XML file #17

Closed suryasaha closed 3 years ago

suryasaha commented 3 years ago

Missing opening tag

surya@malus:~/work/NAL/i5k_first_10_set/APLA.faa/def-45-80-APLA_full$ xmllint APLA.xml
APLA.xml:46911: parser error : Extra content at the end of the document
  <protein>
  ^

It looks like the merging skips the opening tag <?xml version="1.0" encoding="UTF-8"?><protein-matches xmlns="http://www.ebi.ac.uk/interpro/resources/schemas/interproscan5" interproscan-version="5.45-80.0">

So we have a bunch of dangling </protein-matches> tags

suryasaha commented 3 years ago

Possibly something I'm missing. Why do we remove the XML header lines? @amcooksey https://github.com/AgBase/InterProScan/blob/08ae6068f4771b168426a3ccd8c3032f4c49da92/5.45-80/iprs_wrapper.sh#L227

suryasaha commented 3 years ago

Looks like the protein-matches closing tag should also be removed from all split XML files and then added to the final XML file