A-J-F-Mackintosh / syngraph

Toolkit for evolutionary analyses of linkage groups
GNU General Public License v3.0
21 stars 2 forks source link

format of sequence column in busco_full_table.tsv #8

Open XuanZhang-Black opened 6 months ago

XuanZhang-Black commented 6 months ago

Hi Alex,

I used busco to evaluate the genome_assembly.fa file and the resulting file for syngraph, I extracted the *full_table.tsv file as shown below,: 0at7088 25 6419737 6528949 1at7088 28 9918437 9798722 2at7088 5 9853750 9950607 3at7088 8 16558545 16762952

when I ran it I encountered an error: Traceback (most recent call last): File "/home/data/t240413/software/syngraph-master/syngraph", line 7, in < module> main() File "/home/data/t240413/software/syngraph-master/cli/interface.py", line 39, in main build.main(run_params) File "/home/data/t240413/software/syngraph-master/cli/build.py", line 56, in main markerObjs = sg.load_markerObjs(parameterObj) File "/home/data/t240413/software/syngraph-master/source/syngraph.py", line 35, in load_markerObjs df = pd.read_csv(infile, File "/ home/data/t240413 / miniconda3 / envs myenv/lib/python3.8 / site - packages/pandas/IO/parsers/readers. Py", line 912, in read_csv return _read(filepath_or_buffer, kwds) File "/ home/data/t240413 / miniconda3 / envs myenv/lib/python3.8 / site - packages/pandas/IO/parsers/readers. Py", line 583, in _read return parser.read(nrows) File "/ home/data/t240413 / miniconda3 / envs myenv/lib/python3.8 / site - packages/pandas/IO/parsers/readers. Py", line 1704, in read ) = self._engine.read( # type: ignore[attr-defined] File "/ home/data/t240413 miniconda3 envs/myenv/lib/python3.8 / site - packages/pandas/IO/parsers/c_parser_wrapper py", line 234, in read chunks = self._reader.read_low_memory(nrows) File "pandas/_libs/parsers.pyx", line 812, in pandas._libs.parsers.TextReader.read_low_memory File "pandas/_libs/parsers.pyx", line 889, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 1034, in pandas._libs.parsers.TextReader._convert_column_data File "pandas/_libs/parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_tokens File "pandas/_libs/parsers.pyx", line 1192, in pandas._libs.parsers.TextReader._convert_with_dtype ValueError: Integer column has NA values in column 2

I have checked my file, the second column is chromosomal serial number, I want to know the sequence column has any special requirements? Looking for your reply.

Best,

Xuan Zhang

A-J-F-Mackintosh commented 6 months ago

Hi Xuan Zhang,

I suspect that you have included lines in the BUSCO tsv for genes that are missing in your assembly - these lines will have empty coordinates and cause an error. You should grep the file to only include Complete BUSCOs (you can include Fragmented ones too if you like).

Also make sure that the file in tab delimited.

Cheers,

Alex