etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
540 stars 164 forks source link

Improve documentation for read_auto() #618

Closed BJWiley233 closed 2 years ago

BJWiley233 commented 3 years ago

Hi Authors,

Please add documentation that the files being read in CANNOT have a header under documentation of read_auto. This should also be placed under documentation here. Simply like "be aware annotations should not have header". UCSC files are downloaded with headers and as such you get the error if you have a header like:

Detected file format: refflat
/Users/brian/anaconda3/envs/cnvkit/lib/python3.8/site-packages/cnvlib/target.py:20: DtypeWarning: Columns (4,5) have mixed types.Specify dtype option on import or set low_memory=False.
  annotation = tabio.read_auto(annotate)

Thank!

etal commented 3 years ago

Thanks for reporting. Let me see if I can fix read_auto to handle UCSC BED headers automatically instead. I'll mark this as a bug.

tskir commented 3 years ago

I can help with implementing this. @BJWiley233 could you please provide an example of a UCSC file which causes the issue, and also preferrably which CNVkit command you're running it through?

BJWiley233 commented 3 years ago

Sure I will get you one by tomorrow or Monday. Happy 4th of July.

etal commented 2 years ago

Now read_auto should handle these header lines without error.