@anthonybejjani had an issue with preparing scATAC-seq fragment files. I was able to reproduce this error.
_______ _____
/\|__ __|/\ / ____|
_ __ ___ __ ___ __ / \ | | / \ | |
| '_ ` _ \ / _` \ \/ / / /\ \ | | / /\ \| |
| | | | | | (_| |> < / ____ \| |/ ____ \ |____
|_| |_| |_|\__,_/_/\_\/_/ \_\_/_/ \_\_____|
[2022-05-27 12:54:29,022]
Input file: /Users/caz3so/scratch/20220525_maxatac_scatac_subset/GM12878_scATAC_10k_fragments.tsv.gz
Input chromosome sizes file: /Users/caz3so/opt/maxatac/data/hg38/hg38.chrom.sizes
Tn5 cut sites will be slopped 20 bps on each side
Input blacklist file: /Users/caz3so/opt/maxatac/data/hg38/hg38_maxatac_blacklist.bw
Output filename: GM12878_scatac_10k
Output directory: /Users/caz3so/scratch/20220525_maxatac_scatac_subset
Using a millions factor of: 20000000
Using 9 threads to run job.
[2022-05-27 12:54:29,023]
Generate the normalized signal tracks.
[2022-05-27 12:54:29,023]
Working on 10X scATAC fragments file
Converting fragment files to Tn5 sites
Traceback (most recent call last):
File "pandas/_libs/parsers.pyx", line 1113, in pandas._libs.parsers.TextReader._convert_tokens
TypeError: Cannot cast array data from dtype('O') to dtype('int32') according to the rule 'safe'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/caz3so/workspaces/miraldiLab/maxATAC/maxatac/bin/maxatac", line 24, in <module>
sys.exit(main(sys.argv[1:]))
File "/Users/caz3so/workspaces/miraldiLab/maxATAC/maxatac/bin/maxatac", line 20, in main
args.func(args)
File "/Users/caz3so/workspaces/miraldiLab/maxATAC/maxatac/analyses/prepare.py", line 93, in run_prepare
bed_df = convert_fragments_to_tn5_bed(args.input, ALL_CHRS)
File "/Users/caz3so/workspaces/miraldiLab/maxATAC/maxatac/utilities/prepare_tools.py", line 25, in convert_fragments_to_tn5_bed
df = pd.read_table(fragments_tsv,
File "/Users/caz3so/opt/anaconda3/envs/maxatac/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/Users/caz3so/opt/anaconda3/envs/maxatac/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 779, in read_table
return _read(filepath_or_buffer, kwds)
File "/Users/caz3so/opt/anaconda3/envs/maxatac/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 581, in _read
return parser.read(nrows)
File "/Users/caz3so/opt/anaconda3/envs/maxatac/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1254, in read
index, columns, col_dict = self._engine.read(nrows)
File "/Users/caz3so/opt/anaconda3/envs/maxatac/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 787, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 883, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1026, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1119, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: invalid literal for int() with base 10: 'GAGATTCCAAAGGTCG-1'
I was able to reproduce this error using my mac. The problem code is the function for converting the fragments to Tn5 sites. Specifically, the code that reads in the fragments file. I had originally coded the function to set the col_types and names. This was to try to save memory and speed up reading the text file.
I tested reading in the file without setting the data types and the code worked. The data type is not necessary for this function to work, but it seems to be causing issues. I fixed the function for importing text files with:
# Import fragments tsv as a dataframe
df = pd.read_table(fragments_tsv,
sep="\t",
header=None,
usecols=[0,1,2,3],
names=["chr", "start", "stop", "barcode"]
)
@anthonybejjani had an issue with preparing scATAC-seq fragment files. I was able to reproduce this error.
I was able to reproduce this error using my mac. The problem code is the function for converting the fragments to Tn5 sites. Specifically, the code that reads in the fragments file. I had originally coded the function to set the col_types and names. This was to try to save memory and speed up reading the text file.
I tested reading in the file without setting the data types and the code worked. The data type is not necessary for this function to work, but it seems to be causing issues. I fixed the function for importing text files with: