UcarLab / AMULET

A count based method for detecting doublets from single nucleus ATAC-seq (snATAC-seq) data.
https://ucarlab.github.io/AMULET/
GNU General Public License v3.0
29 stars 5 forks source link

Amulet output contains only Overlap file, there is no overlaps summary #23

Open jasiozaucha opened 7 months ago

jasiozaucha commented 7 months ago

I tried running the AMULET.sh script but it only produces the Overlaps.txt output, the OverlapSummary.txt file is not written.

Then, the python script throws an error.

`Traceback (most recent call last): summarydata = pd.read_csv(args.overlapsummary, sep="\t").values

pandas.errors.EmptyDataError: No columns to parse from file`

ajt986 commented 7 months ago

Hi,

It looks like there was an error while overlaps were being calculated. The overlaps are likely partial, as this file is written as they are being found. The summary file is written at the very end. Was there an error reported while finding the overlaps?

Also, is this using the alignment (.bam) or fragments file? One solution may be to try using the other input method if possible to generate these overlap outputs (i.e., using the bam file if you used the fragment file previously, or using the fragment file if you used the bam file previously).

jasiozaucha commented 7 months ago

No errors came up when calculating overlaps, the script seems to run smoothly up to the point when it tries to load outputs from the first step. I used fragments because I don't have the bam file.

ajt986 commented 7 months ago

Does your input single cell csv file include a is__cell_barcode column? The fragment overlap counter filters all barcodes where this column is not set to 1. It is also recommend that all other barcodes that will be excluded to be set to 0 as this wanted to be read numerically and may filter out everything if read as string. Regardless of whether there are overlaps detected or not, you should still see a summary file unless the barcodes are not being read or being filtered out entirely (resulting in an empty overlap summary file).

Relevant code:

Initialization of the summary data: (Note: you should still have information for each barcode as values are initialized to 0)

    #Set up barcode maps
    sc_data = pd.read_csv(singlecellfile)
    sc_data = sc_data[sc_data['is__cell_barcode'] == 1]
    bc_map = dict()
    previous_reads = dict()
    previous_ends = dict()
    overlapcounts = dict()
    vreadspercell = dict()
    readspercell = dict()

    for curbarcode in sc_data['barcode']:
        bc_map[curbarcode] = []
        previous_reads[curbarcode] = []
        previous_ends[curbarcode] = -1
        overlapcounts[curbarcode] = 0
        vreadspercell[curbarcode] = 0
        readspercell[curbarcode] = 0

Code to write the file: writeOverlapSummary(path+"/OverlapSummary.txt", overlapcounts, vreadspercell, readspercell)

jasiozaucha commented 7 months ago

yes, the relevant column you ask about is present and there are many barcodes set to 1

Screenshot 2024-02-07 at 18 03 54
ajt986 commented 7 months ago

Based on this, if you were to look at overlapcounts, I'd expect that you'd see a dictionary of barcodes as keys and 0 as values. This is a very strange error and the possible sources I can think of have essentially been ruled out.

I'm sorry I don't have an answer to what else could be going wrong and unfortunately I don't have a good solution outside of manually running the code in jupyter notebook/python terminal and going through step by step.

jasiozaucha commented 7 months ago

OK, I found the problem, it is to due to the depreciation of If you use a newer version of numpy should throw the following error:

AttributeError: module 'numpy' has no attribute 'object'. np.objectwas a deprecated alias for the builtinobject.

in line 193 FragmentFileOverlapCounter.py you need to change dtype=np.object to dtype=object: table = np.empty(shape=(len(overlapcounts),len(colnames)), dtype=object)

What makes debugging really difficult is that somehow invoking AMULET via the shell script, does not throw this error. It would be very useful for you guys to create a PyPI package, which would allow providing a specification for the dependancy version that AMULET runs with.

Cheers!