bxlab / bx-python

Tools for manipulating biological data, particularly multiple sequence alignments
MIT License
145 stars 53 forks source link

documentation? #76

Open mufernando opened 3 years ago

mufernando commented 3 years ago

are there any docs for the tool?

I was trying to use the axt submodule but with no success: what is the species_to_length argument?

thank you!

rsharris commented 3 years ago

I don't know whether overall docs exist.

There is some documentation for species_to_lengthS in the code, in lib/bx/align/core.py, in the body of the Alignment class's constructor:

species_to_lengths is needed only for file formats that don't provide chromosome lengths;  it maps each species name to one of these:
  - the name of a file that contains a list of chromosome length pairs
  - a dict mapping chromosome names to their length
  - a single length value (useful when we just have one sequence and no chromosomes)

Because of the way axt describes positions for reverse-strand alignments, chromosome lengths are needed.

So, it's a dict that maps each of two species names (only two because axt only supports pairwise alignments) to either a filename or a dict that maps chromosome names to their length. The "single length value" variant can be used if you have an alignment of one sequence to the other and (I suppose) doesn't contain chromosome names (or in which the chromosome name is used instead of a species name).

The format of the "file that contains a list of chromosome length pairs" is described at the top of lib/bx/misc/readlengths/py.

As I recall, there's some assumption about sequence names having the form <species>.<chrom> .

nsoranzo commented 3 years ago

The docs are at https://bx-python.readthedocs.io/ , but that's mostly generated from the method/class docstrings, and species_to_lengths was not documented there, so thanks @rsharris for digging through the code!

rsharris commented 3 years ago

There's a 95% chance that species_to_lengths was part of my minor contributions to the package, circa 2005. So I was pretty sure that was documented somewhere, but it did take a little digging to locate it.