Closed H1889 closed 4 months ago
One more thing after running my command it is displayed:
"Now processing: chipap.py --mode single --peak narrow --chipR1 dark_1.fastq.gz dark_2.fastq.gz --ctrlR1 INPUT.fastq.gz --output /media/yomismo/Datos/CHIPSEQ_RNASEQ/CHIPSEQ --setname DARK_CONTROL --ref hg38 --genome /media/yomismo/Datos/CHIPSEQ_RNASEQ/CHIPSEQ/CHIP_AP_GENOME/crassa --custom_setting_table /media/yomismo/Datos/CHIPSEQ_RNASEQ/CHIPSEQ/setting_table.tsv --fcmerge --homer_motif union --meme_motif union --deltemp --thread 32 --run"
So the program is asuming the hg38 human genome as reference
I have discovered a possible bug in the guide to generate the custom genome folder, I think that besides the 3 folders bwa, GEM and bbmap it is necesary another folder called chrom.sizes whith the file crassa.chrom.sizes
Hello, thank you so much for your kind comments. Please give us a few days to diagnose, fix, and test. Will get back to you as soon as possible. Cheers.
Thank you for your help. I forgot one detail. When chip-ap runs with a custom genome it must also need the genome annotation files to be able to annotate the peaks, how can those files be generated from a GTF or GFF file? In my chip-ap installation I see a folder called:
.../share/homer/data/genomes
where there are the folders for the precalculated genomes dm6, hg19,hg38,mm10,mm9 and sacCer3. An in each of those folders the annotationa files, for instance for sacCer3: sacCer3/ ├── annotations │ ├── basic │ │ ├── centromeres.ann.txt │ │ ├── coding.ann.txt │ │ ├── cpgIsland.ann.txt │ │ ├── exons.ann.txt │ │ ├── gaps.ann.txt │ │ ├── intergenic.ann.txt │ │ ├── introns.ann.txt │ │ ├── miRNA.ann.txt │ │ ├── promoters.ann.txt │ │ ├── protein-coding.ann.txt │ │ ├── pseudo.ann.txt │ │ ├── tts.ann.txt │ │ ├── utr3.ann.txt │ │ └── utr5.ann.txt │ ├── custom │ └── repeats ├── chrI.fa ├── chrII.fa ├── chrIII.fa ├── chrIV.fa ├── chrIX.fa ├── chrM.fa ├── chrom.sizes ├── chrV.fa ├── chrVI.fa ├── chrVII.fa ├── chrVIII.fa ├── chrX.fa ├── chrXI.fa ├── chrXII.fa ├── chrXIII.fa ├── chrXIV.fa ├── chrXV.fa ├── chrXVI.fa ├── preparsed │ ├── sacCer3r.0.pos │ ├── sacCer3r.401.cgbins │ ├── sacCer3r.401.cgfreq │ ├── sacCer3r.401.gcbins │ ├── sacCer3r.401.pos │ ├── sacCer3r.401.seq │ ├── sacCer3r.507.cgbins │ ├── sacCer3r.507.cgfreq │ ├── sacCer3r.507.gcbins │ ├── sacCer3r.507.pos │ ├── sacCer3r.507.seq │ ├── sacCer3r.759.cgbins │ ├── sacCer3r.759.cgfreq │ ├── sacCer3r.759.gcbins │ ├── sacCer3r.759.pos │ ├── sacCer3r.759.seq │ ├── sacCer3r.789.cgbins │ ├── sacCer3r.789.cgfreq │ ├── sacCer3r.789.gcbins │ ├── sacCer3r.789.pos │ ├── sacCer3r.789.seq │ ├── sacCer3r.870.cgbins │ ├── sacCer3r.870.cgfreq │ ├── sacCer3r.870.gcbins │ ├── sacCer3r.870.pos │ ├── sacCer3r.870.seq │ ├── sacCer3r.916.cgbins │ ├── sacCer3r.916.cgfreq │ ├── sacCer3r.916.gcbins │ ├── sacCer3r.916.pos │ └── sacCer3r.916.seq ├── sacCer3.aug ├── sacCer3.basic.annotation ├── sacCer3.full.annotation ├── sacCer3.miRNA ├── sacCer3.repeats ├── sacCer3.rna ├── sacCer3.splice3p ├── sacCer3.splice5p ├── sacCer3.stop ├── sacCer3.tss └── sacCer3.tts
How are those files generated?
Sorry for the inconvenience. best
I have realized that the annotation files and GO-analysis files are generated by perl scripts from the HOMER package, so, theoretically I can add those files running the scripts
Hi, thank you for your detailed bug reports. It really helps us to improve streamlining the process for custom genomes. I have made necessary changes to the scripts to accommodate custom genome like yours, and will potentially be publishing a new ChIP-AP release in one or two days.
Since it is not a widely established reference, I am not sure the N. crassa genome (NC12 as in NCBI RefSeq version) and annotation that I got my hands on is identical to what you got over there. Therefore, for now, please download and take a look at the attached file below. Can you help to fill in (or edit, to be precise) the "chromosome" names (line 86 & 215) and their lengths (line 215). To easily check the lengths, you can use the command "faidx filename.fa -i chromsizes". Additionally, you may also edit the genome name in that file from NC12 to crassa, as that is how you name your genome reference.
Once you've done that, please resend it back to me so I can test if ChIP-AP is able to process everything from start to finish.
PS: BTW, have you successfully created the necessary annotation files using the perl scripts from HOMER?
Thank you, very,very much for your help. The chromosome names and sizes are ok, they are the same I am using. So I have not edited the 86 and 215 lines. Regarding if I have been able to get the HOMER files the answer is no, it gave me an error, surely I have not done something right. If I get it I will let you know. Regards
Hi, attached are the resources you will need to run with your ncrassa custom genome.
Cheers! Thank you for helping us improve ChIP-AP!
Thank you very much! I will try it tomorrow and let you know how it went. thank you again for making our lives easier
best
I have followed your instructions, however it was stopped by this error:
I do not understand why it says bwa is aligning paired-end, my data are single-end, this was my command:
chipap.py --mode single --peak narrow --chipR1 dark_1.fastq.gz dark_2.fastq.gz --ctrlR1 INPUT.fastq.gz --output /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ --setname DARK_CONTROL --ref NC12 --genome /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/CHIP_AP_GENOME --custom_setting_table /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/setting_table.tsv --fcmerge --goann --homer_motif union --meme_motif union --deltemp --thread 32 --run
Thanks
I have detected the error, my NC12.fa.pac file was named as NC12fa.pac, anyway, why bwa says it is aligning paired-end?
Now is running again. Hope this time all goes well. Best
It was stooped by a new error:
Opening file: /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/DARK_CONTROL/22_peaks_processing/DARK_CONTROL_all_peaks_annotated.tsv
Running the calculator on ChIP replicate DARK_CONTROL_chip_merged and DARK_CONTROL_ctrl_merged
chip_uniquely_mapped_read_count: 9467721
ctrl_uniquely_mapped_read_count: 42678599
chip_normalization_factor_value: 0.22183767091323686
ctrl_normalization_factor_value: 1
Traceback (most recent call last):
File "/media/yomismo/Datos/chipap_installation/chipap_scripts/fold_change_calculator.py", line 485, in
Any suggestion? thanks and sorry for the inconvenience
Hello, thanks for running the pipeline and reaching back.
The paired-end notification is a mistake in my script, so don't worry about it because bwa is actually running single-end in your case. Replace your chipap.py with the one in the attachment and that trivial problem will be no more.
As for the GC% and CpG% error, replace your fold_change_calculator.py with the one in the attachment. As far as I tested, it should resolve your issue above.
Hopefully it goes well! Please let me know how it turns up later.
A new error at meme:
meme_sequence_extractor.py: error: argument --ref: invalid choice: 'NC12' (choose from 'hg19', 'hg38', 'mm9', 'mm10', 'dm6', 'sacCer3')
Hi again, have you resolved the new error at meme? Thanks and sorry for the inconvenience
Sorry, I missed the notification from your comment 5 days ago. Please replace your meme_sequence_extractor.py with the one from the attachment. Please feel free to ask further questions.
Cheers!
don't worry, you are already making enough effort. I will try it and let you know the results, thanks a million.
Hi again, the run has ended as follows:
Generating background sequences for target sequences replicate 1...
Background sequences (replicate 1) has been generated: /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/DARK_CONTROL/25_meme_motif_enrichment/union_peak_set/background.fa Background sequences (replicate 1) stats: Number of sequences: 7559 Length of sequences: 1105 Total G or C characters: 3973966 Total A or T characters: 4378729 Individual background sequence GC content: 0.4758
Is this correct?
Greetings
Hi, so inside the bottom last folder (25_meme_motif_enrichment), can you find a folder named motif_analysis inside your union_peak_set folder? If yes, can you find meme-chip.html inside that folder? If yes, can you see the top motif candidates when you open that file? If you can for all three things above, then the pipeline has completed successfully.
Let me know if anything still did not work or if any of the resulting outputs looks weird to you.
Cheers!
Thank you very much. The answer is yes, I can see all those files and folders and the motifs. it seems everything is now ok. I will patiently check all the results and write you if I detect anything anomalous.
All the best
Hi. I have checked all the folders and files and seems everything is ok. Only the GO columns are empty. Homer has no data for N. crassa linking annotation and go-terms. A detail that can improve the app is to compute the motifs in the consensus mode with the maximum of methods, i.e, in my data GEM detects no peaks but if I run in mode consensus no result is shown given that one of the methods failed: MACS2 GEM HOMER Genrich Total Name X 7145 MACS2 X X 4 MACS2|Genrich X X 361 MACS2|HOMER X X X 48 MACS2|HOMER|Genrich
in that case my motif prediction must be based on the 48 common peaks or give the possibility to select 2, 3 or 4 methods to compute motif prediction. Evenmore, you can give the option to compute peaks given a minimum FC and/or minium -logIDR. Thanks
Hi, thanks for the feedback.
Can you recheck whether GEM returned an empty peak list (which is not so rare if peaks are rather hard to detect in general), or it somehow crashes during peak calling (which sometimes happen since it is the most resource-demanding peak caller among others)? You can check by looking at the two log files.
Looking at the number of called peaks, there is a high chance that only MACS2 managed to call peaks properly in your dataset. Since the union peak set is basically equal to your MACS2 peak set, I would check if it the motif enrichment analysis did call the expected motif (if you have a motif reference for your target).
As for doing the motif enrichment analysis with 2 or 3 peak callers, it is in our plan to implement in the future. However, with each step being carried out by modular scripts, you can easily do it yourself by opening "24_homer_motif_enrichment_consensus_script.sh" and changing the number 4 in "awk 'NR == 1 ; $7 == 4'" into 2 or 3 and rerun. That should do the trick for you. You can also generate multiple results for both 2 and 3 peak caller overlaps by simply changing the naming of the files.
If you want to filter with FC or IDR or practically anything else, you can simply open the xxxxx_all_peaks_calculated.tsv file in folder 22, use a spreadsheet application like MS Excel or LibreOffice Calc to freely filter the peaks you want, then remove all other columns except column 1 to 5, save it, and feed it directly to findMotifsGenome.pl (see "24_homer_motif_enrichment_consensus_script.sh" for an example; remember to modify the filenames as needed)
Now for meme, you can open "25_meme_motif_enrichment_consensus_script.sh" and change the flag argument "--filter 4" into 2 or 3 and rerun. Same thing, you can also generate multiple results. Just play around with the file naming.
If you want to filter with FC or IDR or practically anything else, you can simply open the xxxxx_all_peaks_calculated.tsv file in folder 22, use a spreadsheet application like MS Excel or LibreOffice Calc to freely filter the peaks you want,, save it, and feed it to meme_sequence_extractor.py while removing the flag argument "--filter 4" (see "25_meme_motif_enrichment_consensus_script.sh" for an example; remember to modify the filenames as needed).
Hope this helps! Cheers!
Perfect explanation, I will do as you said. Respect to GEM this is the err log:
" Warning: The read distribution is not updated, too few (499<500) significant events.
Warning: The read distribution is not updated, too few (496<500) significant events.
chrNC_026614.1.fa file is not in correct FASTA format."
An this is the out log:
"Welcome to GEM (version 2.7)!
Please cite: Yuchun Guo, Shaun Mahony, David K. Gifford (2012) PLoS Computational Biology 8(8): e1002638. High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. doi:10.1371/journal.pcbi.1002638
Gifford Laboratory at MIT (http://cgs.csail.mit.edu/gem/).
Start time: 2024/07/02 14:10:20
Loading data... Loading reads from: DARK_CONTROL_chip_rep1.bam ... Loaded Loading reads from: DARK_CONTROL_chip_rep2.bam ... Loaded Loading reads from: DARK_CONTROL_ctrl_rep1.bam ... Loaded
Options: -Xmx10G --k_min 8 --k_max 12 --t 16 --d /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/CHIP_AP_GENOME/GEM/Read_Distribution_default.txt --g /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/CHIP_AP_GENOME/GEM/NC12.chrom.sizes --genome /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/CHIP_AP_GENOME/GEM/NC12_Chr_FASTA --expt /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/DARK_CONTROL/08_results/DARK_CONTROL_chip_rep1.bam --expt /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/DARK_CONTROL/08_results/DARK_CONTROL_chip_rep2.bam --ctrl /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/DARK_CONTROL/08_results/DARK_CONTROL_ctrl_rep1.bam --f SAM --out /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/DARK_CONTROL/12_gem_peak_calling/DARK_CONTROL_GEM
Mappable Genome Length is 32.881.902.
Getting 5' positions of all reads...
Original read count stats: _IP Bases: 8223078 HitCounts: 9467707.0 _CTRL Bases: 35228650 HitCounts: 4.2678598E7
Filter duplicate reads. _IP Bases: 8223078 HitCounts: 9121074.0 _CTRL Bases: 35228650 HitCounts: 4.2283689E7
At Poisson p-value 1,0e-03, in a 501bp window, expect 177 reads.
IP: 9121074 Ctrl: 42283689 IP/Ctrl: 0,22
Sorting reads and selecting enriched regions ... For condition , IP/Control = 0,01
The genome is segmented into 43461 regions for analysis.
============================ Round 0 ============================
Running with 16 threads ...
100 /43461 56,1s 1000 /43461 1,1m 10000 /43461 1,3m 20000 /43461 1,3m 43461 /43461 1,3m
Scaling condition , IP/Control = 0,20
Events discovered Significant: 499 Insignificant: 30 Filtered: 47
Total 280 homotypic events (within 501bp of other events).
Finish binding event prediction: 1,3m
============================ Round 1 ============================
Running with 16 threads ...
Events discovered Significant: 496 Insignificant: 27 Filtered: 46
Total 280 homotypic events (within 501bp of other events).
Finish binding event prediction: 18,1s
Loading genome sequences ..."
the *_GEM_GPS_events.txt file has 497 lines, these a re the top 10:
osition IP Control Fold Expectd Q-lg10 P-lg10 P_poiss IPvsEMP Noise KmerGroup MotifId KG_hgp Strand NW_011929459.1:94803 35723,6 449,1 79,5 749,1 999,00 999,00 999,00 -0,42 0,28 -------- -1 0.00 NW_011929459.1:96953 32570,1 654,6 49,8 749,1 999,00 999,00 999,00 -0,59 0,67 -------- -1 0.00 NW_011929459.1:96590 28584,5 958,2 29,8 749,1 999,00 999,00 999,00 -0,40 0,67 -------- -1 0.00 NW_011929459.1:97513 26108,7 753,3 34,7 749,1 999,00 999,00 999,00 -0,38 0,67 -------- -1 0.00 NW_011929459.1:94205 17442,8 410,9 42,4 749,1 999,00 999,00 999,00 -0,61 0,33 -------- -1 0.00 NW_011929459.1:97180 16655,7 397,9 41,9 749,1 999,00 999,00 999,00 -0,72 0,67 -------- -1 0.00 NW_011929459.1:93959 14044,4 281,0 50,0 749,1 999,00 999,00 999,00 -0,63 0,33 -------- -1 0.00 NW_011929459.1:97329 13835,9 307,2 45,0 749,1 999,00 999,00 999,00 -0,70 0,67 -------- -1 0.00 NW_011929459.1:94975 13285,0 674,6 19,7 749,1 999,00 999,00 999,00 -0,52 0,28 -------- -1 0.00 NW_011929459.1:94375 10101,5 887,9 11,4 749,1 999,00 999,00 999,00 -0,64 0,33 -------- -1 0.00
An these the bottom 10: NC_026505.1:2319556 462,4 55,0 8,4 389,0 3,82 80,99 3,85 -0,50 0,00 -------- -1 0.00 NC_026505.1:5847216 501,3 66,8 7,5 428,6 3,50 82,95 3,53 -0,62 0,00 -------- -1 0.00 NC_026502.1:3018407 251,6 71,1 3,5 201,6 3,44 24,13 3,46 -0,58 0,00 -------- -1 0.00 NC_026501.1:2234890 248,0 80,7 3,1 198,6 3,37 20,42 3,39 -0,51 0,00 -------- -1 0.00 NC_026503.1:4608131 258,6 47,9 5,4 209,8 3,22 35,72 3,24 -0,67 0,00 -------- -1 0.00 NC_026502.1:2089670 390,1 44,1 8,8 332,4 3,00 69,52 3,02 -0,50 0,00 -------- -1 0.00 NC_026505.1:5715699 294,0 107,3 2,7 244,3 2,94 20,57 2,96 -0,46 0,00 -------- -1 0.00 NC_026501.1:5354740 314,0 59,2 5,3 265,8 2,73 42,44 2,75 -0,57 0,00 -------- -1 0.00 NC_026504.1:2182132 498,0 106,7 4,7 435,6 2,71 60,79 2,74 -0,22 0,60 -------- -1 0.00 NC_026507.1:3908647 373,0 74,4 5,0 324,2 2,34 48,17 2,37 -0,41 0,00 -------- -1 0.00 "
I have run 5 different experimental conditions of N. crassa and none gave GEM peaks. The computation resources are not a problem, my workstation has 32 cpus (64 threads) and 512Gb of RAM.
Thanks again for your help and involvement
It seems the problem is caused by your chrNC_026614.1.fa file. Can you try and see if it begins like below:
>NC_026614.1 GGATCCACTCGTAGAGTGGAATCACAGTTCTGTGGAGGCACGGCTATCCTATAGCGCGTCTTGCGACTGCGACGGAGCAG CTGGATTAGTACTGGACGGATCATAATCATCAACCTGGGTAGTTTAAAGGTAAAACCTTAATTTCATACATTAAAGATGA GAATTCGATTTTCTCCCCAGGTTCGCGTGCGCACTATGGAGGGGAAGGGCACTGATCGAGTAGGACCAGAGCTTTCCTTA TTTAAATCTCCAAATTTCTTCAAAATTTATGATAATAATGACTATACTGTCTTTATTACTTACTAATGCCGTTACTTTAA GACGAGATATTTCCATACTTTTTAATAGAATTGTGATTATAGCATTAATTTATTGTATTTTGCATGATACAATGAGTTTA TCTATTATAAGTAAGGGGGTAGGTTTACATGGAGGTTTACTTCATATAACCAATATAACACAAGTATTCCAAATATTTAT
Correct! The file has not the head >NC_026614.1 and none of the *.fa files of the GEM folder has the head. I will add the hedas manually and re-run the commands and tell you. Thnaks!
If you used the chromfa_splitter.py like below:
chromfa_splitter.py --fa [genome folder]/bwa/NC12.fa --output [genome folder]/GEM/NC12_Chr_FASTA/ --gem
Then you should have them all correct, supposedly. I believe I sent you an updated chromfa_splitter.py in one of the attachment above. I tested it and it should give you what you need.
I have GEM peaks now! However the GEM_IDR_output.tsv at folder 22 is empty. The GEM_IDR_input.narrowpeak seems ok.
Thease are the first lines of the input file:
chrNC_026507.1 1765596 1765646 chrNC_026507.1:1765596-1765646 0 . 135,9 -1 -1 -1 chrNW_011929459.1 94778 94828 chrNW_011929459.1:94778-94828 0 . 79,1 -1 -1 -1 chrNC_026503.1 1585442 1585492 chrNC_026503.1:1585442-1585492 0 . 74,1 -1 -1 -1 chrNC_026505.1 679271 679321 chrNC_026505.1:679271-679321 0 . 69,7 -1 -1 -1 chrNC_026501.1 1837395 1837445 chrNC_026501.1:1837395-1837445 0 . 55,6 -1 -1 -1 chrNW_011929459.1 93935 93985 chrNW_011929459.1:93935-93985 0 . 50,0 -1 -1 -1 chrNW_011929459.1 96939 96989 chrNW_011929459.1:96939-96989 0 . 49,3 -1 -1 -1 chrNC_026505.1 303710 303760 chrNC_026505.1:303710-303760 0 . 46,6 -1 -1 -1 chrNC_026505.1 679769 679819 chrNC_026505.1:679769-679819 0 . 45,4 -1 -1 -1
probably this is the source of the error:
"Peak --rank signal.value --use-nonoverlapping-peaks
/home/yomismo/mambaforge/envs/CHIP-AP/bin/idr -s /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/DARK_CONTROL/22_peaks_processing/DARK_CONTROL_all_peaks_calculated.narrowPeak /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/DARK_CONTROL/22_peaks_processing/IDR_files/GEM_IDR_input.narrowPeak -o /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/DARK_CONTROL/22_peaks_processing/IDR_files/GEM_IDR_output.tsv --input-file-type narrowPeak --rank signal.value --use-nonoverlapping-peaks
Traceback (most recent call last):
File "/home/yomismo/mambaforge/envs/CHIP-AP/bin/idr", line 10, in
and later:
"Opening file: /media/yomismo/Datos/VOS_CHIPSEQ_RNASEQ/CHIPSEQ/DARK_CONTROL/22_peaks_processing/IDR_files/GEM_IDR_output.tsv
Traceback (most recent call last):
File "/media/yomismo/Datos/chipap_installation/chipap_scripts/IDR_integrator.py", line 111, in
Thank you very much. Greetings
The problem is exactly this line:
ValueError: could not convert string to float: '135,9'"
For some weird reason, the decimal separators are all commas instead of periods in that .narrowPeak file. From all the GEM outputs you pasted above, it looks like your GEM uses comma as a decimal separator (and period as a thousand separator, at that). This is the first time I saw something like this. Do you have any idea how that happened? Does your computer, by chance, has comma as a decimal separator and period as a thousand separator, as a system default?
I have checked my decimal and thousands separators with locale: locale thousands_sep , locale decimal_point .
So the system is ok. The input file *_IDR_input.narrowPeak of MACS2, HOMER and Genrich all have decimal with ".". Only the one from GEM has decimals with ",".
The GEM narrowpeaks input file is generated from the event.txt file and that file is generated with "," as decimal, so de problem is focused in the GEM script at step 12. I realized that my yeast data I analysed with chip-ap three monts ago did not have GEM peaks, for that reason I did not see the "," decimal error.
Probably the problem is:
1-python script generating GEM chromosome files was wrong. 2-No GEM peaks were generated. 3-The decimal error can not be exposed. 4-When the GEM chromosome fasta files are correct, GEM gives peaks 5-When GEM peaks are availabe the hidden decimal error is exposed.
I am now checking the command generating GEM peaks as a candidate source for the error.
What is your oppinion? Do you think I am right?
best
I have solved the problem of decimals. Despite my system uses decimals with ".", GEM runs with Java that does not use the system configurtion, Java uses the language configuration, so I changed my language to american english and the problem was solved. Sorry for the inconvenience. Best
You can close the issue, I ran all the samples and everything went fine. Thank you very,very for your help Best
Good to hear! I am closing this thread. Don't hesitate to start a new issue if you encounter any more problem.
hello, thank you for making such a fantastic tool. I have already tested it with yeast and it works perfectly. However now I have to use it with my custom genome. I have followed the instructions to generate the files as indicated in the guide. I have placed them in a directory but I get an error:
Checking for suite requirements Please make sure bbduk adapter sequences reference fasta file "adapter.fa" exists, inside folder named "bbmap", inside your genome directory folder Example: [YOUR GENOME DIRECTORY]/bbmap/adapters.fa Please make sure bwa aligner genome reference fasta file "hg38.fa" exists, inside folder named "bwa", inside your genome directory folder Example: [YOUR GENOME DIRECTORY]/bwa/hg38.fa Please make sure GEM default read distribution reference file "Read_Distribution_default.txt" exists, inside folder named "GEM", inside your genome directory folder Example: [YOUR GENOME DIRECTORY]/GEM/Read_Distribution_default.txt Please make sure GEM hg38 chromosome sizes reference file "hg38.chrom.sizes" exists, inside folder named "GEM", inside your genome directory folder Example: [YOUR GENOME DIRECTORY]/GEM/hg38.chrom.sizes Please make sure GEM genome reference FOLDER "hg38_Chr_FASTA" exists (containing multiple individual fasta files for every chromosome), inside folder named "GEM", inside your genome directory folder Example: [YOUR GENOME DIRECTORY]/GEM/hg38_Chr_FASTA/ Exiting suite
I have checked all the required files and they are ok.
My question is: is it necessary to put in the path the name of the custom genome?
My genome is called crassa.fa and this is my command:
chipap.py --mode single --peak narrow --output /media/yomismo/Datos/CHIPSEQ_RNASEQ/CHIPSEQ --setname DARK_CONTROL --genome /media/yomismo/Datos/CHIPSEQ_RNASEQ/CHIPSEQ/CHIP_AP_GENOME/ --custom_setting_table /media/yomismo/Datos/CHIPSEQ_RNASEQ/CHIPSEQ/setting_table.tsv --fcmerge --homer_motif union --meme_motif union --deltemp --thread 32 --run --chipR1 dark_1.fastq.gz dark_2.fastq.gz --ctrlR1 INPUT.fastq.gz
This is my genome folder:
├── bbmap │ └── adapters.fa ├── bwa │ ├── crassa.fa │ ├── crassa.fa.amb │ ├── crassa.fa.ann │ ├── crassa.fa.bwt │ ├── crassa.fa.fai │ ├── crassa.fa.pac │ └── crassa.fa.sa └── GEM ├── crassa_Chr_FASTA │ ├── NC_026501.1.fa │ ├── NC_026502.1.fa │ ├── NC_026503.1.fa │ ├── NC_026504.1.fa │ ├── NC_026505.1.fa │ ├── NC_026506.1.fa │ ├── NC_026507.1.fa │ ├── NC_026614.1.fa │ ├── NW_011929459.1.fa │ ├── NW_011929460.1.fa │ ├── NW_011929461.1.fa │ ├── NW_011929462.1.fa │ ├── NW_011929463.1.fa │ ├── NW_011929464.1.fa │ ├── NW_011929465.1.fa │ ├── NW_011929466.1.fa │ ├── NW_011929467.1.fa │ ├── NW_011929468.1.fa │ ├── NW_011929469.1.fa │ ├── NW_011929470.1.fa │ └── NW_011929471.1.fa ├── crassa.chrom.sizes ├── Read_Distribution_BP.txt ├── Read_Distribution_ChIP-exo.txt ├── Read_Distribution_CLIP.txt └── Read_Distribution_default.txt
4 directories, 34 files
Thanks