4dn-dcic / pairix

1D/2D indexing and querying on bgzipped text file with a pair of genomic coordinates
MIT License
86 stars 14 forks source link

[ti_index_core] the indexes overlap or are out of bounds #73

Open molecule53 opened 1 year ago

molecule53 commented 1 year ago

Hello, I am trying to run pairix tool for the analysis with cooler tool to create and contact map but running into this error:

(base) ubuntu@ip-172-31-18-119:/Data1$ pairix corrected2_porec_test.concatemers.pairs.txt.gz [get_intv] the following line cannot be parsed and skipped: CONCAT0 + - UU 1 chr2 5443 chr1 3003 32 R1 [ti_index_core] the indexes overlap or are out of bounds

zcat corrected2_porec_test.concatemers.pairs.txt.gz | head -n 20

pairs format v1.0.0

shape: whole matrix

genome_assembly: unknown

chromsize: chr1 3577

chromsize: chr2 7551

samheader: @SQ SN:chr1 LN:3577

samheader: @SQ SN:chr2 LN:7551

samheader: CL:minimap2 -ay -t 2 @PG PN:minimap2 ID:minimap2 VN:2.24-r1122 map-ont -x

samheader: PP:minimap2 CL:/home/epi2melabs/conda/bin/pore-c-py annotate - @PG PN:pore-c-py ID:pore-c-py-2 VN:2.0.1 --monomers porec_test.concatemers

samheader: parse2 --output-stats porec_test.concatemers.stats.txt -c @PG ID:pairtools_parse2 PN:pairtools_parse2 CL:/home/epi2melabs/conda/bin/pairtools --single-end fasta.fai

samheader: restrict -f fragments.bed -o @PG ID:pairtools_restrict PN:pairtools_restrict CL:/home/epi2melabs/conda/bin/pairtools extract_pairs.tmp porec_test.concatemers.pairs.gz

columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type walk_pair_index walk_pair_type mapq1 mapq2 pos51 pos52 pos31 pos32 cigar1 cigar2 read_len1 read_len2 matched_bp1 matched_bp2 algn_ref_span1 algn_ref_span2 algn_read_span1 algn_read_span2 dist_to_51 dist_to_52 dist_to_31 dist_to_32 mismatches1 mismatches2 rfrag1 rfrag_start1 rfrag_end1 rfrag2 rfrag_start2 rfrag_end2

CONCAT0 + - UU 1 chr2 5443 chr1 3003 32 R1 CONCAT0 + - UU 2 chr1 1104 chr1 1103 60 R1 CONCAT0 + - UU 3 chr1 602 chr2 6455 60 R1 CONCAT0 + + UN 4 chr2 5530 ! 0 60 R1 CONCAT0 - - NU 5 ! 0 chr2 6538 0 R1 CONCAT0 + - UU 6 chr2 6456 chr2 5442 51 R1 CONCAT1 + - UU 1 chr1 3004 chr2 6538 60 R1 CONCAT2 + - UU 1 chr1 1104 chr1 601 60 R1

SooLee commented 1 year ago

You need to sort the file before indexing. See the Pairix doc about how to sort the file.

On Tue, Apr 11, 2023, 1:18 PM molecule53 @.***> wrote:

Hello, I am trying to run pairix tool for the analysis with cooler tool to create and contact map but running into this error:

(base) @.***:/Data1$ pairix corrected2_porec_test.concatemers.pairs.txt.gz [get_intv] the following line cannot be parsed and skipped: CONCAT0 + - UU 1 chr2 5443 chr1 3003 32 R1 [ti_index_core] the indexes overlap or are out of bounds

zcat corrected2_porec_test.concatemers.pairs.txt.gz | head -n 20 pairs format v1.0.0

shape: whole matrix

genome_assembly: unknown

chromsize: chr1 3577

chromsize: chr2 7551

samheader: @sq https://github.com/sq SN:chr1 LN:3577

samheader: @sq https://github.com/sq SN:chr2 LN:7551

samheader: CL:minimap2 -ay -t 2 @pg https://github.com/pg PN:minimap2

ID:minimap2 VN:2.24-r1122 map-ont -x

samheader: PP:minimap2 CL:/home/epi2melabs/conda/bin/pore-c-py annotate -

@pg https://github.com/pg PN:pore-c-py ID:pore-c-py-2 VN:2.0.1 --monomers porec_test.concatemers

samheader: parse2 --output-stats porec_test.concatemers.stats.txt -c @pg

https://github.com/pg ID:pairtools_parse2 PN:pairtools_parse2 CL:/home/epi2melabs/conda/bin/pairtools --single-end fasta.fai

samheader: restrict -f fragments.bed -o @pg https://github.com/pg

ID:pairtools_restrict PN:pairtools_restrict CL:/home/epi2melabs/conda/bin/pairtools extract_pairs.tmp porec_test.concatemers.pairs.gz

columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type

walk_pair_index walk_pair_type mapq1 mapq2 pos51 pos52 pos31 pos32 cigar1 cigar2 read_len1 read_len2 matched_bp1 matched_bp2 algn_ref_span1 algn_ref_span2 algn_read_span1 algn_read_span2 dist_to_51 dist_to_52 dist_to_31 dist_to_32 mismatches1 mismatches2 rfrag1 rfrag_start1 rfrag_end1 rfrag2 rfrag_start2 rfrag_end2 CONCAT0 + - UU 1 chr2 5443 chr1 3003 32 R1 CONCAT0 + - UU 2 chr1 1104 chr1 1103 60 R1 CONCAT0 + - UU 3 chr1 602 chr2 6455 60 R1 CONCAT0 + + UN 4 chr2 5530 ! 0 60 R1 CONCAT0 - - NU 5 ! 0 chr2 6538 0 R1 CONCAT0 + - UU 6 chr2 6456 chr2 5442 51 R1 CONCAT1 + - UU 1 chr1 3004 chr2 6538 60 R1 CONCAT2 + - UU 1 chr1 1104 chr1 601 60 R1

— Reply to this email directly, view it on GitHub https://github.com/4dn-dcic/pairix/issues/73, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHLO3E5IK7ZDDIPBOCRYA3XAWG6LANCNFSM6AAAAAAW2R6LJA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

molecule53 commented 1 year ago

I just tried sorting:

pairix sorted_Test_Galaxy_20230203_Pore-C-70K_C_fastq_to_bamsorted.pairs.txt.gz [get_intv] the following line cannot be parsed and skipped: 0000575c-de6a-4338-bac8-cdd60d8c5a90 ! 0 ! 0 - + NN 1 [ti_index_core] the indexes overlap or are out of bounds

Here is my new sorted file before bgzip: cat sorted_Test_Galaxy_20230203_Pore-C-70K_C_fastq_to_bamsorted.pairs.txt | tail -n 50 45bb6b15-44f2-4e56-82fa-8e21ac40c855 Chr5 26963502 Chr5 25595629 + + UU 3 2b35315c-299d-4d4b-aab3-4f6c41d02058 Chr5 26963502 Chr5 25602432 + - UU 6 c949465b-b890-4844-a28a-3806ceecb4f8 Chr5 26963502 Chr5 26961514 - + UU 4 712dbfd8-bd4a-4d2d-afd0-f6594617edb3 Chr5 26963505 Chr5 756533 - - UU 3 50f508a9-488e-4e39-8703-3341f0c9a70b Chr5 26963505 Chr5 26961514 - + UU 2 33cfaaa7-5c00-4ca3-9123-566f648944b6 Chr5 26963505 Chr5 26967237 - + UU 1 30a86009-39df-4fd1-9dc4-9411f673ee62 Chr5 26964548 Chr5 26566570 + + UU 2 9c1cf00f-1ab8-4447-9c77-3859f84dd40c Chr5 26965340 Chr5 63152 + + UU 2 7b40584f-1080-4e6c-a093-0df206ac0f62 Chr5 26965340 Chr5 26191756 + - UU 3 45a058c9-44e0-41bc-a4a8-0ab5fa43a3b8 Chr5 26965343 Chr5 11704004 - + UU 1 c949465b-b890-4844-a28a-3806ceecb4f8 Chr5 26965343 Chr5 26959362 - + UU 2 f3e8780e-8974-480a-b7d2-569686aeb629 Chr5 26965343 Chr5 26962556 - + UU 2 b0d06e56-181e-4dac-8398-345b061ad7e5 Chr5 26965343 Chr5 26962569 - + UU 2 a3ee7f1d-3dde-4ed8-9ebb-679eceb3eb9b Chr5 26965343 Chr5 26965344 - + UU 1 0f7fa3b6-57ad-478a-9c5b-81886b2b1a7b Chr5 26968475 Chr5 26968471 + - UU 4 0f7fa3b6-57ad-478a-9c5b-81886b2b1a7b Chr5 26968702 Chr5 1452557 + + UU 1 d2cfbf8e-f961-4a4f-b69f-2cfcb6f6ff2e Chr5 26968702 Chr5 20478337 + + UU 3 821da5c8-55cb-47d5-8c8f-3b2b27a96db1 Chr5 26968702 Chr5 26971328 + - UU 2 111685de-003c-4c25-af8a-6a70caa413d5 Chr5 26968705 Chr5 26576591 - - UU 1 33cfaaa7-5c00-4ca3-9123-566f648944b6 Chr5 26968705 Chr5 26963506 - + UU 2 f6c58d6b-1b2b-4bc1-986b-e8fdc038a1fe Chr5 26968705 Chr5 26971333 - + UU 3 d06413ae-e1f7-4a44-b1c5-dd8bee87606c Chr5 26968965 Chr5 689615 + - UU 2 7e829b73-1d2a-43e4-93ae-6051f2efbb46 Chr5 26968965 Chr5 3670881 + + UU 2 5696f59d-992e-44f3-bc43-c5d3e7b3310b Chr5 26968965 Chr5 26970537 + - UU 1 47eb03d3-49f1-4a13-92ad-faffe3a2d88b Chr5 26968965 Chr5 26972686 + - UU 2 52c598be-01bf-4989-9187-d499888473bd Chr5 26968965 Chr5 26972689 + - UU 2 6a5adb50-3aae-4ba9-b30e-9b0cc8b48c66 Chr5 26968965 Chr5 26972689 + - UU 1 1f297eda-db06-4b1c-82b2-c3a14f3ed04b Chr5 26968968 Chr5 3507431 - - UU 1 f6c58d6b-1b2b-4bc1-986b-e8fdc038a1fe Chr5 26968968 Chr5 26967251 - + UU 2 5696f59d-992e-44f3-bc43-c5d3e7b3310b Chr5 26969914 Chr5 26968964 + - UU 2 f2afa522-f322-4890-888a-fef4434b1441 Chr5 26970495 Chr5 24855805 - - UU 5 52c598be-01bf-4989-9187-d499888473bd Chr5 26971329 Chr5 26973320 + - UU 3 f3c3951b-c94f-4b2f-8e13-fd980e1e39e1 Chr5 26971332 Chr5 3398919 - - UU 1 f2afa522-f322-4890-888a-fef4434b1441 Chr5 26971332 Chr5 26376061 - - UU 1 d6d5e650-6579-4e6d-a015-d9d86ab9f83b Chr5 26971332 Chr5 26968706 - + UU 3 f6c58d6b-1b2b-4bc1-986b-e8fdc038a1fe Chr5 26971332 Chr5 26968706 - + UU 1 e012f3ec-dc9d-4a51-967e-b30b1aeb6cc0 Chr5 26972690 Chr5 26974046 + - UU 2 c8617e93-3a38-4334-8bac-ff627804bf88 Chr5 26972690 Chr5 26975502 + - UU 1 d6d5e650-6579-4e6d-a015-d9d86ab9f83b Chr5 26972693 Chr5 26968969 - + UU 2 259632b1-a672-41f7-86aa-b04605b15c69 Chr5 26972693 Chr5 26971333 - + UU 1 3be4e565-7a97-4bac-bc7d-775183af421f Chr5 26973548 Chr5 25085045 + - UU 3 74e4c442-d98e-451f-bbe3-a79e91288783 Chr5 26973548 Chr5 26975502 + - UU 2 cf14d609-5521-48bf-8944-3e4ba9c1152a Chr5 26973551 Chr5 88460 - + UU 7 ce284cca-8966-4db2-9220-e2f95e767bb9 Chr5 26973551 Chr5 2077524 - - UU 2 dd5c42b4-13db-4ef4-a30b-3e9604bd0208 Chr5 26973551 Chr5 26135458 - + UU 3 941c415e-b004-426b-9478-c48a9c05029e Chr5 26973551 Chr5 26766316 - - UU 1

pairs format v1.0.0

columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type

genome_assembly: unknown

shape: whole matrix

SooLee commented 1 year ago

Your file is not sorted. Follow the Pairix doc to sort the file by chr1 chr2 pos1 pos2.

On Tue, Apr 11, 2023, 2:29 PM molecule53 @.***> wrote:

I just tried sorting:

pairix sorted_Test_Galaxy_20230203_Pore-C-70K_C_fastq_to_bamsorted.pairs.txt.gz [get_intv] the following line cannot be parsed and skipped: 0000575c-de6a-4338-bac8-cdd60d8c5a90 ! 0 ! 0 - + NN 1 [ti_index_core] the indexes overlap or are out of bounds

Here is my new sorted file before bgzip: cat sorted_Test_Galaxy_20230203_Pore-C-70K_C_fastq_to_bamsorted.pairs.txt | tail -n 50 45bb6b15-44f2-4e56-82fa-8e21ac40c855 Chr5 26963502 Chr5 25595629 + + UU 3 2b35315c-299d-4d4b-aab3-4f6c41d02058 Chr5 26963502 Chr5 25602432 + - UU 6 c949465b-b890-4844-a28a-3806ceecb4f8 Chr5 26963502 Chr5 26961514 - + UU 4 712dbfd8-bd4a-4d2d-afd0-f6594617edb3 Chr5 26963505 Chr5 756533 - - UU 3 50f508a9-488e-4e39-8703-3341f0c9a70b Chr5 26963505 Chr5 26961514 - + UU 2 33cfaaa7-5c00-4ca3-9123-566f648944b6 Chr5 26963505 Chr5 26967237 - + UU 1 30a86009-39df-4fd1-9dc4-9411f673ee62 Chr5 26964548 Chr5 26566570 + + UU 2 9c1cf00f-1ab8-4447-9c77-3859f84dd40c Chr5 26965340 Chr5 63152 + + UU 2 7b40584f-1080-4e6c-a093-0df206ac0f62 Chr5 26965340 Chr5 26191756 + - UU 3 45a058c9-44e0-41bc-a4a8-0ab5fa43a3b8 Chr5 26965343 Chr5 11704004 - + UU 1 c949465b-b890-4844-a28a-3806ceecb4f8 Chr5 26965343 Chr5 26959362 - + UU 2 f3e8780e-8974-480a-b7d2-569686aeb629 Chr5 26965343 Chr5 26962556 - + UU 2 b0d06e56-181e-4dac-8398-345b061ad7e5 Chr5 26965343 Chr5 26962569 - + UU 2 a3ee7f1d-3dde-4ed8-9ebb-679eceb3eb9b Chr5 26965343 Chr5 26965344 - + UU 1 0f7fa3b6-57ad-478a-9c5b-81886b2b1a7b Chr5 26968475 Chr5 26968471 + - UU 4 0f7fa3b6-57ad-478a-9c5b-81886b2b1a7b Chr5 26968702 Chr5 1452557 + + UU 1 d2cfbf8e-f961-4a4f-b69f-2cfcb6f6ff2e Chr5 26968702 Chr5 20478337 + + UU 3 821da5c8-55cb-47d5-8c8f-3b2b27a96db1 Chr5 26968702 Chr5 26971328 + - UU 2 111685de-003c-4c25-af8a-6a70caa413d5 Chr5 26968705 Chr5 26576591 - - UU 1 33cfaaa7-5c00-4ca3-9123-566f648944b6 Chr5 26968705 Chr5 26963506 - + UU 2 f6c58d6b-1b2b-4bc1-986b-e8fdc038a1fe Chr5 26968705 Chr5 26971333 - + UU 3 d06413ae-e1f7-4a44-b1c5-dd8bee87606c Chr5 26968965 Chr5 689615 + - UU 2 7e829b73-1d2a-43e4-93ae-6051f2efbb46 Chr5 26968965 Chr5 3670881 + + UU 2 5696f59d-992e-44f3-bc43-c5d3e7b3310b Chr5 26968965 Chr5 26970537 + - UU 1 47eb03d3-49f1-4a13-92ad-faffe3a2d88b Chr5 26968965 Chr5 26972686 + - UU 2 52c598be-01bf-4989-9187-d499888473bd Chr5 26968965 Chr5 26972689 + - UU 2 6a5adb50-3aae-4ba9-b30e-9b0cc8b48c66 Chr5 26968965 Chr5 26972689 + - UU 1 1f297eda-db06-4b1c-82b2-c3a14f3ed04b Chr5 26968968 Chr5 3507431 - - UU 1 f6c58d6b-1b2b-4bc1-986b-e8fdc038a1fe Chr5 26968968 Chr5 26967251 - + UU 2 5696f59d-992e-44f3-bc43-c5d3e7b3310b Chr5 26969914 Chr5 26968964 + - UU 2 f2afa522-f322-4890-888a-fef4434b1441 Chr5 26970495 Chr5 24855805 - - UU 5 52c598be-01bf-4989-9187-d499888473bd Chr5 26971329 Chr5 26973320 + - UU 3 f3c3951b-c94f-4b2f-8e13-fd980e1e39e1 Chr5 26971332 Chr5 3398919 - - UU 1 f2afa522-f322-4890-888a-fef4434b1441 Chr5 26971332 Chr5 26376061 - - UU 1 d6d5e650-6579-4e6d-a015-d9d86ab9f83b Chr5 26971332 Chr5 26968706 - + UU 3 f6c58d6b-1b2b-4bc1-986b-e8fdc038a1fe Chr5 26971332 Chr5 26968706 - + UU 1 e012f3ec-dc9d-4a51-967e-b30b1aeb6cc0 Chr5 26972690 Chr5 26974046 + - UU 2 c8617e93-3a38-4334-8bac-ff627804bf88 Chr5 26972690 Chr5 26975502 + - UU 1 d6d5e650-6579-4e6d-a015-d9d86ab9f83b Chr5 26972693 Chr5 26968969 - + UU 2 259632b1-a672-41f7-86aa-b04605b15c69 Chr5 26972693 Chr5 26971333 - + UU 1 3be4e565-7a97-4bac-bc7d-775183af421f Chr5 26973548 Chr5 25085045 + - UU 3 74e4c442-d98e-451f-bbe3-a79e91288783 Chr5 26973548 Chr5 26975502 + - UU 2 cf14d609-5521-48bf-8944-3e4ba9c1152a Chr5 26973551 Chr5 88460 - + UU 7 ce284cca-8966-4db2-9220-e2f95e767bb9 Chr5 26973551 Chr5 2077524 - - UU 2 dd5c42b4-13db-4ef4-a30b-3e9604bd0208 Chr5 26973551 Chr5 26135458 - + UU 3 941c415e-b004-426b-9478-c48a9c05029e Chr5 26973551 Chr5 26766316 - - UU 1 pairs format v1.0.0

columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type

genome_assembly: unknown

shape: whole matrix

— Reply to this email directly, view it on GitHub https://github.com/4dn-dcic/pairix/issues/73#issuecomment-1503887903, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHLO3DIILLBX7GY4DTOYUDXAWPIVANCNFSM6AAAAAAW2R6LJA . You are receiving this because you commented.Message ID: @.***>

molecule53 commented 1 year ago

Sorry, it is me again.

Here is my original .pairs file ForSoring_porec_test.concatemers.pairs.txt.

Some line have "!"instead of "chr#". Could it be a problem?

cat ForSoring_porec_test.concatemers.pairs.txt | head -n 20

pairs format v1.0.0

shape: whole matrix

genome_assembly: unknown

chromsize: chr1 3577

chromsize: chr2 7551

samheader: @SQ SN:chr1 LN:3577

samheader: @SQ SN:chr2 LN:7551

samheader: @PG PN:minimap2 ID:minimap2 VN:2.24-r1122

samheader: @PG PN:pore-c-py ID:pore-c-py-2 VN:2.0.1

samheader: @PG ID:pairtools_parse2 PN:pairtools_parse2 CL:/home/epi2melabs/conda/bin/pairtools

samheader: @PG ID:pairtools_restrict PN:pairtools_restrict CL:/home/epi2melabs/conda/bin/pairtools

columns: readID chrom1 pos1 chrom2

CONCAT0 chr2 5443 chr1 3003 CONCAT0 chr1 1104 chr1 1103 CONCAT0 chr1 602 chr2 6455 CONCAT0 chr2 5530 ! 0 CONCAT0 ! 0 chr2 6538 CONCAT0 chr2 6456 chr2 5442 CONCAT1 chr1 3004 chr2 6538 CONCAT2 chr1 1104 chr1 601

Then I sort: .../Data1$ sort -k2,2 -k4,4 -k3,3n -k5,5n ForSoring_porec_test.concatemers.pairs.txt > Sorted_porec_test.concatemers.pairs.txt

.../Data1$ cat Sorted_porec_test.concatemers.pairs.txt | head -n 20 CONCAT107 ! 0 chr1 601 CONCAT148 ! 0 chr1 601 CONCAT171 ! 0 chr1 601 CONCAT175 ! 0 chr1 601 CONCAT180 ! 0 chr1 601 CONCAT185 ! 0 chr1 601 CONCAT211 ! 0 chr1 601 CONCAT27 ! 0 chr1 601 CONCAT277 ! 0 chr1 601 CONCAT31 ! 0 chr1 601 CONCAT312 ! 0 chr1 601 CONCAT353 ! 0 chr1 601 CONCAT471 ! 0 chr1 601 CONCAT491 ! 0 chr1 601 CONCAT512 ! 0 chr1 601 CONCAT514 ! 0 chr1 601 CONCAT593 ! 0 chr1 601 CONCAT611 ! 0 chr1 601 CONCAT619 ! 0 chr1 601 CONCAT638 ! 0 chr1 601

.../Data1$ cat Sorted_porec_test.concatemers.pairs.txt | tail -n 20 CONCAT829 chr2 6765 chr2 6764 CONCAT836 chr2 6765 chr2 6764 CONCAT841 chr2 6765 chr2 6764 CONCAT846 chr2 6765 chr2 6764 CONCAT870 chr2 6765 chr2 6764 CONCAT872 chr2 6765 chr2 6764 CONCAT875 chr2 6765 chr2 6764 CONCAT883 chr2 6765 chr2 6764 CONCAT885 chr2 6765 chr2 6764 CONCAT899 chr2 6765 chr2 6764 CONCAT901 chr2 6765 chr2 6764 CONCAT915 chr2 6765 chr2 6764 CONCAT931 chr2 6765 chr2 6764 CONCAT966 chr2 6765 chr2 6764 CONCAT989 chr2 6765 chr2 6764 CONCAT997 chr2 6765 chr2 6764

pairs format v1.0.0

columns: readID chrom1 pos1 chrom2

genome_assembly: unknown

shape: whole matrix

.../Data1$ bgzip Sorted_porec_test.concatemers.pairs.txt

.../Data1$ pairix Sorted_porec_test.concatemers.pairs.txt.gz [get_intv] the following line cannot be parsed and skipped: CONCAT107 ! 0 chr1 601 [ti_index_core] the indexes overlap or are out of bounds

No index file generated!

SooLee commented 1 year ago

Try using 1-based index. Position 0 may be the problem. I think ! should be fine.

On Tue, Apr 11, 2023, 9:13 PM molecule53 @.***> wrote:

Sorry, it is me again.

Here is my original .pairs file ForSoring_porec_test.concatemers.pairs.txt.

Some line have "!"instead of "chr#". Could it be a problem?

cat ForSoring_porec_test.concatemers.pairs.txt | head -n 20 pairs format v1.0.0

shape: whole matrix

genome_assembly: unknown

chromsize: chr1 3577

chromsize: chr2 7551

samheader: @sq https://github.com/sq SN:chr1 LN:3577

samheader: @sq https://github.com/sq SN:chr2 LN:7551

samheader: @pg https://github.com/pg PN:minimap2 ID:minimap2

VN:2.24-r1122

samheader: @pg https://github.com/pg PN:pore-c-py ID:pore-c-py-2

VN:2.0.1

samheader: @pg https://github.com/pg ID:pairtools_parse2

PN:pairtools_parse2 CL:/home/epi2melabs/conda/bin/pairtools

samheader: @pg https://github.com/pg ID:pairtools_restrict

PN:pairtools_restrict CL:/home/epi2melabs/conda/bin/pairtools

columns: readID chrom1 pos1 chrom2

CONCAT0 chr2 5443 chr1 3003 CONCAT0 chr1 1104 chr1 1103 CONCAT0 chr1 602 chr2 6455 CONCAT0 chr2 5530 ! 0 CONCAT0 ! 0 chr2 6538 CONCAT0 chr2 6456 chr2 5442 CONCAT1 chr1 3004 chr2 6538 CONCAT2 chr1 1104 chr1 601

Then I sort: .../Data1$ sort -k2,2 -k4,4 -k3,3n -k5,5n ForSoring_porec_test.concatemers.pairs.txt > Sorted_porec_test.concatemers.pairs.txt

.../Data1$ cat Sorted_porec_test.concatemers.pairs.txt | head -n 20 CONCAT107 ! 0 chr1 601 CONCAT148 ! 0 chr1 601 CONCAT171 ! 0 chr1 601 CONCAT175 ! 0 chr1 601 CONCAT180 ! 0 chr1 601 CONCAT185 ! 0 chr1 601 CONCAT211 ! 0 chr1 601 CONCAT27 ! 0 chr1 601 CONCAT277 ! 0 chr1 601 CONCAT31 ! 0 chr1 601 CONCAT312 ! 0 chr1 601 CONCAT353 ! 0 chr1 601 CONCAT471 ! 0 chr1 601 CONCAT491 ! 0 chr1 601 CONCAT512 ! 0 chr1 601 CONCAT514 ! 0 chr1 601 CONCAT593 ! 0 chr1 601 CONCAT611 ! 0 chr1 601 CONCAT619 ! 0 chr1 601 CONCAT638 ! 0 chr1 601

.../Data1$ cat Sorted_porec_test.concatemers.pairs.txt | tail -n 20 CONCAT829 chr2 6765 chr2 6764 CONCAT836 chr2 6765 chr2 6764 CONCAT841 chr2 6765 chr2 6764 CONCAT846 chr2 6765 chr2 6764 CONCAT870 chr2 6765 chr2 6764 CONCAT872 chr2 6765 chr2 6764 CONCAT875 chr2 6765 chr2 6764 CONCAT883 chr2 6765 chr2 6764 CONCAT885 chr2 6765 chr2 6764 CONCAT899 chr2 6765 chr2 6764 CONCAT901 chr2 6765 chr2 6764 CONCAT915 chr2 6765 chr2 6764 CONCAT931 chr2 6765 chr2 6764 CONCAT966 chr2 6765 chr2 6764 CONCAT989 chr2 6765 chr2 6764 CONCAT997 chr2 6765 chr2 6764 pairs format v1.0.0

columns: readID chrom1 pos1 chrom2

genome_assembly: unknown

shape: whole matrix

.../Data1$ bgzip Sorted_porec_test.concatemers.pairs.txt

.../Data1$ pairix Sorted_porec_test.concatemers.pairs.txt.gz [get_intv] the following line cannot be parsed and skipped: CONCAT107 ! 0 chr1 601 [ti_index_core] the indexes overlap or are out of bounds

No index file generated!

— Reply to this email directly, view it on GitHub https://github.com/4dn-dcic/pairix/issues/73#issuecomment-1504374523, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHLO3HZGBPSSDKXRHFC2LDXAX6U7ANCNFSM6AAAAAAW2R6LJA . You are receiving this because you commented.Message ID: @.***>

molecule53 commented 1 year ago

Hi, Sorry, I am not sure what do you mean. I have a sorted file at this point. How do I change to 1-based index. At what step? Are there any specific instructions that I can use?

maize821 commented 1 year ago

I had the same problem, you can try this parameter: pairix -p pairs -f corrected2_porec_test.concatemers.pairs.txt.gz