MiraldiLab / maxATAC

Transcription Factor Binding Prediction from ATAC-seq and scATAC-seq with Deep Neural Networks
Apache License 2.0
26 stars 9 forks source link

`maxATAC` prepare with sc 10X ATAC #119

Closed daccachejoe closed 2 months ago

daccachejoe commented 1 year ago

Hi!

I am trying to run maxATAC on scATAC data from 10X system. I am running maxatac preapre and pointing the function to my fragments file that ends with .tsv.gz and it fails with (I think) an pandas error. Here is the output for reference

[2023-04-13 14:25:01,922]
Input file: /gpfs/data/sequence/results/naiklab/2023-03-24/cellranger/count-CTRL/outs/atac_fragments.tsv.gz 
Input chromosome sizes file: /gpfs/home/jd5457/opt/maxatac/data/hg38/hg38.chrom.sizes 
Tn5 cut sites will be slopped 20 bps on each side 
Input blacklist file: /gpfs/home/jd5457/opt/maxatac/data/hg38/hg38_maxatac_blacklist.bw 
Output filename: CTRL
Output directory: ./data/max-atac-prepare 
Using a millions factor of: 20000000 
Using 48 threads to run job.
[2023-04-13 14:25:02,067]
Generate the normalized signal tracks.
[2023-04-13 14:25:02,067]
Working on 10X scATAC fragments file 
 Converting fragment files to Tn5 sites
Traceback (most recent call last):
  File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/maxatac/bin/maxatac", line 24, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/maxatac/bin/maxatac", line 20, in main
    args.func(args)
  File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/maxatac/lib/python3.9/site-packages/maxatac/analyses/prepare.py", line 93, in run_prepare
    bed_df = convert_fragments_to_tn5_bed(args.input, ALL_CHRS)
  File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/maxatac/lib/python3.9/site-packages/maxatac/utilities/prepare_tools.py", line 25, in convert_fragments_to_tn5_bed
    df = pd.read_table(fragments_tsv,
  File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/maxatac/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1242, in read_table
    return _read(filepath_or_buffer, kwds)
  File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/maxatac/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 583, in _read
    return parser.read(nrows)
  File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/maxatac/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1704, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/gpfs/share/apps/anaconda3/gpu/5.2.0/envs/maxatac/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 239, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 794, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 889, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1034, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas/_libs/parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas/_libs/parsers.pyx", line 1192, in pandas._libs.parsers.TextReader._convert_with_dtype
ValueError: Integer column has NA values in column 1
ANRudrapatna commented 9 months ago

Hello!

We are sorry for the massive delay in getting back to you! Were you able to find a solution to this issue? If not, would you be able to share the first few lines of your fragments file? This error might be an issue with your fragments file. Thanks!

Elfaba commented 6 months ago

Thank you for looking into this. I will try out your solution. Best.