PouletAxel / SIP

SIP: Significant Interaction Peak caller
GNU General Public License v3.0
13 stars 3 forks source link

Chromosome records in chromsizes but absent from data cause KeyError exception. #9

Closed adadiehl closed 3 years ago

adadiehl commented 3 years ago

See traceback...

`Traceback (most recent call last): File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'chr19_KI270915v1_alt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/cooler/util.py", line 167, in parse_region clen = chromsizes[chrom] if chromsizes is not None else None File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/pandas/core/series.py", line 882, in getitem return self._get_value(key) File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/pandas/core/series.py", line 989, in _get_value loc = self.index.get_loc(label) File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc raise KeyError(key) from err KeyError: 'chr19_KI270915v1_alt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/adadiehl/.conda/envs/cooler_env/bin/cooler", line 10, in sys.exit(cli()) File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(args, *kwargs) File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/cooler/cli/_util.py", line 200, in decorated func(args, **kwargs) File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/cooler/cli/dump.py", line 379, in dump h5, c._chromids, parse_region(range, c.chromsizes), binsize=c.binsize File "/home/adadiehl/.conda/envs/cooler_env/lib/python3.7/site-packages/cooler/util.py", line 169, in parse_region raise ValueError("Unknown sequence label: {}".format(chrom)) ValueError: Unknown sequence label: chr19_KI270915v1_alt `

Filtering the offending records from the chromsizes file resolved the issue.

PouletAxel commented 3 years ago

Did you use the chr.sizes file to do the mcool file and the DIP analysis? Because here it is mostly cooler error due to the absence of chr19_KI270915v1_alt in the mcool file?

adadiehl commented 3 years ago

The short answer is that, based on errors and (lack of) results from SIP, I suspect a problem with the matrix files. At least, they might not contain what SIP is expecting, as they come from the Pore-C method. This may be out of your wheelhouse, but if you can make any sense of the problem, it would be really helpful.

These are not my own data, but are from the recent preprint for the Pore-C method (https://www.biorxiv.org/content/10.1101/833590v1.full.pdf).

Data were retrieved from https://ont-datasets-us-east-1-public.s3.amazonaws.com/20191103.preprint_NA12878.tar.gz and I am using their chromsizes file, which is in results/refgenome/GRCh38.rg.chromsizes. I can only guess that this is the chromsizes file they used to prepare the matrices, which are in data/matrix.

Interestingly, though taking out the unrepresented entries from the chromsizes file appeared to get SIP running properly, I'm not sure things are truly working. For one, the program never returned to the command line after printing the "End of SIP loops are available in loops/" message, requiring a ctrl-c. Worse yet, the 5kbLoops.txt file was empty after the run finished. Looking back through terminal messages, I saw a lot of this...

'11 loops/5kb/chrX/chrX_55000000_64999999.txt 2 loops/5kb/chrX/chrX_10000000_19999999.txt 3 loops/5kb/chrX/chrX_15000000_24999999.txt 10 loops/5kb/chrX/chrX_50000000_59999999.txt 5 loops/5kb/chrX/chrX_25000000_34999999.txt 17 loops/5kb/chrX/chrX_85000000_94999999.txt 12 loops/5kb/chrX/chrX_60000000_69999999.txt 30 loops/5kb/chrX/chrX_150000000_156040895.txt 0 loops/5kb/chrX/chrX_0_9999999.txt 13 loops/5kb/chrX/chrX_65000000_74999999.txt 1 loops/5kb/chrX/chrX_5000000_14999999.txt 7 loops/5kb/chrX/chrX_35000000_44999999.txt 27 loops/5kb/chrX/chrX_135000000_144999999.txt 9 loops/5kb/chrX/chrX_45000000_54999999.txt 4 loops/5kb/chrX/chrX_20000000_29999999.txt ####### End loops detection for chr chrX 0 loops before the FDR filter Filtering value at 0.01 FDR is 10000.0 APscore and 10000.0 RegionalAPscore

Deleting image file for chrX'

Why 0 loops before FDR filter when it looks like it should be ~150?

In case it helps, the Pore-C processing pipeline and snakemake workflows they employ are in these repositories.. https://github.com/nanoporetech/pore-c https://github.com/nanoporetech/Pore-C-Snakemake

Thank you for your time!

Best, Adam

On Wed, Oct 21, 2020 at 1:46 PM PouletAxel notifications@github.com wrote:

Did you use the chr.sizes file to do the mcool file and the DIP analysis? Because here it is mostly cooler error due to the absence of chr19_KI270915v1_alt in the mcool file?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PouletAxel/SIP/issues/9#issuecomment-713744401, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUMZSOZBH52TPT4RQXLS3DSL4M6DANCNFSM4S2C3VYQ .

PouletAxel commented 3 years ago

I will check that,

can you re run the program and try with 10kb resolution like in their paper? -res 10000 -mat 1000 -t 1500 -fdr 0.05

adadiehl commented 3 years ago

Done... Still no loop calls, though I see different loop counts written to the terminal now...

'Deleting image file for chrUn_KI270391v1 3 loops/10kb/chrX/chrX_15000000_24999999.txt 10 loops/10kb/chrX/chrX_50000000_59999999.txt 5 loops/10kb/chrX/chrX_25000000_34999999.txt 17 loops/10kb/chrX/chrX_85000000_94999999.txt 12 loops/10kb/chrX/chrX_60000000_69999999.txt 30 loops/10kb/chrX/chrX_150000000_156040895.txt 0 loops/10kb/chrX/chrX_0_9999999.txt 13 loops/10kb/chrX/chrX_65000000_74999999.txt 1 loops/10kb/chrX/chrX_5000000_14999999.txt 7 loops/10kb/chrX/chrX_35000000_44999999.txt 27 loops/10kb/chrX/chrX_135000000_144999999.txt 9 loops/10kb/chrX/chrX_45000000_54999999.txt 4 loops/10kb/chrX/chrX_20000000_29999999.txt ####### End loops detection for chr chrX 0 loops before the FDR filter Filtering value at 0.05 FDR is 10000.0 APscore and 10000.0 RegionalAPscore

Deleting image file for chrX'

Hope this helps!

Best, Adam

On Wed, Oct 21, 2020 at 3:48 PM PouletAxel notifications@github.com wrote:

I will check that,

can you re run the program and try with 10kb resolution like in their paper? -res 10000 -mat 1000 -t 1500 -fdr 0.05

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PouletAxel/SIP/issues/9#issuecomment-713833913, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUMZSOCAPDTA3XZSYICI43SL43HHANCNFSM4S2C3VYQ .

PouletAxel commented 3 years ago

Thanks, I will work on it, and come back to you. Don't know, I hope before the end of the week Best Axel

On Wed, 21 Oct 2020 at 16:05, Adam Diehl notifications@github.com wrote:

Done... Still no loop calls, though I see different loop counts written to the terminal now...

'Deleting image file for chrUn_KI270391v1 3 loops/10kb/chrX/chrX_15000000_24999999.txt 10 loops/10kb/chrX/chrX_50000000_59999999.txt 5 loops/10kb/chrX/chrX_25000000_34999999.txt 17 loops/10kb/chrX/chrX_85000000_94999999.txt 12 loops/10kb/chrX/chrX_60000000_69999999.txt 30 loops/10kb/chrX/chrX_150000000_156040895.txt 0 loops/10kb/chrX/chrX_0_9999999.txt 13 loops/10kb/chrX/chrX_65000000_74999999.txt 1 loops/10kb/chrX/chrX_5000000_14999999.txt 7 loops/10kb/chrX/chrX_35000000_44999999.txt 27 loops/10kb/chrX/chrX_135000000_144999999.txt 9 loops/10kb/chrX/chrX_45000000_54999999.txt 4 loops/10kb/chrX/chrX_20000000_29999999.txt ####### End loops detection for chr chrX 0 loops before the FDR filter Filtering value at 0.05 FDR is 10000.0 APscore and 10000.0 RegionalAPscore

Deleting image file for chrX'

Hope this helps!

Best, Adam

On Wed, Oct 21, 2020 at 3:48 PM PouletAxel notifications@github.com wrote:

I will check that,

can you re run the program and try with 10kb resolution like in their paper? -res 10000 -mat 1000 -t 1500 -fdr 0.05

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PouletAxel/SIP/issues/9#issuecomment-713833913, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACUMZSOCAPDTA3XZSYICI43SL43HHANCNFSM4S2C3VYQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PouletAxel/SIP/issues/9#issuecomment-713842936, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABS4UBISJ53L3S3G7D6WAI3SL45KNANCNFSM4S2C3VYQ .

PouletAxel commented 3 years ago

Hi, Sorry for the delay of my answer I was working on other stuff. So the problem here is teh value in the mcool are really low, So most of them a close to 0 and will be black during the image processing. I put a too small factor to correct that. It was enough for all the data set I tested but to small for this one. You can wait my release. Or you can on the file made by SIP multiply the tow last columns by 100 or 1000.

best Axel