…as when using Ensembl annotations (chromosome naming conventions).
Hi Wolfgang,
This pull request addresses issues I ran into when working with ATAC fragment files following Ensembl genome annotations ("1"/1 instead of "chr1", ..., and having headers). The memory-efficient data loading strategy from pandas (pd.read_csv - also internally called by BedTool()) led to mixed data types (e.g., chromosome 1 being represented as "1" and 1) causing inconsistent values for some chromosomes and if a header is present, start and end being read as float instead of an integer as required elsewhere in the code.
The implemented changes should be downward compatible.
…as when using Ensembl annotations (chromosome naming conventions).
Hi Wolfgang,
This pull request addresses issues I ran into when working with ATAC fragment files following Ensembl genome annotations ("1"/1 instead of "chr1", ..., and having headers). The memory-efficient data loading strategy from pandas (pd.read_csv - also internally called by BedTool()) led to mixed data types (e.g., chromosome 1 being represented as "1" and 1) causing inconsistent values for some chromosomes and if a header is present, start and end being read as float instead of an integer as required elsewhere in the code.
The implemented changes should be downward compatible.
Let me know if you have any questions.
Best, Pia