Closed saky7009 closed 2 months ago
Hello! Would you be able to share the top few lines of one of your methylbed files and the gene_cord_df? I suspect it's an issue of the code finding the correct gene column
methylbed files Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9 Column10 Column11 Column12 1 HL4_0007 998 999 m 1 - 998 999 255 1 0 m 2 HL4_0007 1074 1075 h 1 - 1074 1075 255 1 0 h 3 HL4_0007 1074 1075 m 1 - 1074 1075 255 1 100 m 4 HL4_0007 1083 1084 h 1 - 1083 1084 255 1 0 h 5 HL4_0007 1083 1084 m 1 - 1083 1084 255 1 100 m 6 HL4_0007 1090 1091 h 1 - 1090 1091 255 1 0 h
gene_cord_df...
print(head(gene_cord_df))
Chromosome Chr gene.ID Strand Low High Adapt_Low Adapt_High 1 HL4_0001 HL4_0001 ID=FUN_000001; + 23352 23664 23352 23664 2 HL4_0001 HL4_0001 ID=FUN_000002; - 28579 29021 28579 29021 3 HL4_0001 HL4_0001 ID=FUN_000003; - 29624 30085 29624 30085 4 HL4_0001 HL4_0001 ID=FUN_000004; + 31086 33814 31086 33814 5 HL4_0001 HL4_0001 ID=FUN_000005; - 34359 35221 34359 35221 6 HL4_0001 HL4_0001 ID=FUN_000006; - 44608 46071 44608 46071 print(str(gene_cord_df))
'data.frame': 69129 obs. of 8 variables: $ Chromosome: chr "HL4_0001" "HL4_0001" "HL4_0001" "HL4_0001" ... $ Chr : chr "HL4_0001" "HL4_0001" "HL4_0001" "HL4_0001" ... $ gene.ID : chr "ID=FUN_000001;" "ID=FUN_000002;" "ID=FUN_000003;" "ID=FUN_000004;" ... $ Strand : chr "+" "-" "-" "+" ... $ Low : int 23352 28579 29624 31086 34359 44608 55307 69501 71384 74400 ... $ High : int 23664 29021 30085 33814 35221 46071 55617 69917 72081 74924 ... $ Adapt_Low : int 23352 28579 29624 31086 34359 44608 55307 69501 71384 74400 ... $ Adapt_High: int 23664 29021 30085 33814 35221 46071 55617 69917 72081 74924 ..
Sorry for the delay, I was traveling. So looking at your methylbed files it appears that there are duplicate positions in different rows. Are you using the 5hmC model in Dorado? Our pipeline is designed to use the 5mC model which leads to some crucial differences in the input data. You can take a look at this issue for more information.
I used the 5mc Dorado model and faced the same issue here; although she didn't use the Geneco file.
HL4_0007 X1074 X1075 m X1 X. X1074.1 X1075.1 X255.0.0 X1.1 X100.00 X1.2 X0 X0.1 X0.2 X0.3 X0.4 X0.5 1 HL4_0007 1083 1084 m 1 - 1083 1084 255,0,0 1 100 1 0 0 0 0 0 0 2 HL4_0007 1090 1091 m 1 - 1090 1091 255,0,0 1 100 1 0 0 0 0 0 0 3 HL4_0007 1115 1116 m 1 - 1115 1116 255,0,0 1 100 1 0 0 0 0 0 0 4 HL4_0007 1131 1132 m 1 - 1131 1132 255,0,0 1 100 1 0 0 0 0 0 0 5 HL4_0007 1147 1148 m 1 - 1147 1148 255,0,0 1 100 1 0 0 0 0 0 0 6 HL4_0007 9654 9655 m 1 - 9654 9655 255,0,0 1 100 1 0 0 0 0 0 0 HL4_0007 X9674 X9675 m X1 X. X9674.1 X9675.1 X255.0.0 X1.1 X100.00 X1.2 X0 X0.1 X0.2 X0.3 X0.4 X0.5 1 HL4_0007 9685 9686 m 1 + 9685 9686 255,0,0 1 0 0 1 0 0 0 0 0 2 HL4_0007 9721 9722 m 1 + 9721 9722 255,0,0 1 100 1 0 0 0 0 0 0 3 HL4_0007 9763 9764 m 1 + 9763 9764 255,0,0 1 100 1 0 0 0 0 0 0 4 HL4_0007 9885 9886 m 1 + 9885 9886 255,0,0 1 100 1 0 0 0 0 0 0 5 HL4_0007 9918 9919 m 1 + 9918 9919 255,0,0 1 100 1 0 0 0 0 0 0 6 HL4_0007 10167 10168 m 1 + 10167 10168 255,0,0 1 100 1 0 0 0 0 0 0
gene_coordinate_file <- "C:/habib/HL4_Ameiurus_nebulosus.geneco.csv" Geneco <- read.csv(gene_coordinate_file) head(Geneco) Chromosome Chr Gene_Name Strand Low High Adapt_Low Adapt_High 1 HL4_0001 HL4_0001 ID=FUN_000001; + 23352 23664 23352 23664 2 HL4_0001 HL4_0001 ID=FUN_000002; - 28579 29021 28579 29021 3 HL4_0001 HL4_0001 ID=FUN_000003; - 29624 30085 29624 30085
NOTE: target_info is TRUE. Using all genes in the Geneco as targets. Use gene_list parameter to change
Creating the Megaframe QC : No duplicates in C:/habib/Full/HC2N/HC2N_MOD_CALLS/bedtools/subset_HC2N_methyl_pileup.bed , proceeding QC : No duplicates in C:/habib/Full/HC2T/HC2T_MOD_CALLS/bedtools/subset_HC2T_methyl_pileup.bed , proceeding The experimental design file is now available in current directory! QC : Megaframe looks good QC: The plot provides information about missing data that can be filtered out in the next step by using the filter_NAs parameter
NOTE: Filtering NAs default is set to 1 (Total_samples/2). See documentation for ideas on how to use the filter
Megaframe is now available in current directory and in the R-env!
Creating the ZoomFrame!
Error in $<-.data.frame
(*tmp*
, "Gene", value = "ID=FUN_000004;") :
replacement has 1 row, data has 0
Is it possible that you have no data within this genomic region: ID=FUN_000004?
NOTE: Filtering NAs default is set to 1 (Total_samples/2). See documentation for ideas on how to use the filter
Megaframe is now available in current directory and in the R-env! Creating the ZoomFrame! Error in
$<-.data.frame
(*tmp*
, "Gene", value = "ID=FUN_000505;") : replacement has 1 row, data has 0