chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
609 stars 243 forks source link

Warning in GFFParser: BiopythonDeprecationWarning: UnknownSeq(length) is deprecated; please use Seq(None, length) instead. #143

Open victorlin opened 8 months ago

victorlin commented 8 months ago

Scope

bcbio-gff 0.7.1 and Biopython 1.79 or 1.80.

Description

The feature in question was deprecated in Biopython 1.79 and removed in 1.81. While GFFParser has been written to handle the removal, it had silently ignored the deprecation warning until 03e96caa2b91cce22b6d05cf5d5f473cccbb7eb2 was released in bcbio-gff v0.7.1.

This was by a test in one of my repos after the recent release of 0.7.1. Error message snippet:

+  /home/runner/micromamba/envs/augur/lib/python3.8/site-packages/Bio/Seq.py:2220: BiopythonDeprecationWarning: UnknownSeq(length) is deprecated; please use Seq(None, length) instead.

Workarounds

  1. Silence deprecation warnings (similar to reverting 03e96caa2b91cce22b6d05cf5d5f473cccbb7eb2).
  2. Install biopython>=1.81 which uses a code path that avoids UnknownSeq: https://github.com/chapmanb/bcbb/blob/9c6d83ee3f0491f647a9ecd5947b13c99b478f26/gff/BCBio/GFF/GFFParser.py#L33-L40

Potential solution

Remove references to UnknownSeq and solely use the Seq approach. It should be available with 1.79, maybe even earlier.