Open ksenia-krasheninnikova opened 1 year ago
Thanks for your feedback. The first bug will be fixed in the next version. It is caused by the genome format. You can use seqkit to split the genome sequence into 80 base sets on one line and run AutoHiC again. Regarding the second error, could you please provide me the file /lustre/scratch123/tol/teams/tola/users/kk16/autohic_data/pxEimMaxi1/result/AutoHiC_pxEimMaxi1/autohic_results/chromosome/chromosome.png
, which will help me DEBUG
Thanks for the quick reply. The file is attached
Thanks for the documentation, it seems the problem has been solved. This image is the final global interaction heatmap of the chromosomes. AutoHiC will infer the number of chromosomes from this image. However, it seems that the chromosome data cannot be determined from the image. This resulted in an AutoHiC runtime error, which is currently being fixed. But as you can see from the heatmap, the assembly results are very bad, so AutoHiC cannot correct the errors and split the chromosomes. It is recommended to check the data for problems.
Thanks for the quick reply. The file is attached
Could you please specify the size of the genome, the size of the Hi-C data, animal or plant, diploid or polyploid?
It is a lepidoptera, it might well be a diploid.
This is very strange. AutoHiC also included diploid Lepidoptera in the testing process, but the assembly results were much better than what you provided. I suspect that the scaffolding results are too poor or the amount of Hi-C data is insufficient. Hope this helps.
It was a protist genome of 45Mb. HiC data is high coverage but there is a possibility that it's problematic.
I can confirm with the row length 80bp in FASTA files the pipeline works correctly. Thank you for your help.
I've got AutoHiC results for a couple of datasets and would like to access the FASTA file for the assemblies labeled in .html file as 'Before adjustment'. From the code it seems like it should be the 3d-dna assembly with the lowest estimated number of 'Translocation' + 'Inversion' error. I wonder what happens when the corresponding sums are equal at different iterations (but they differ in 'Debris' number)? Also, what would be the best way to extract the FASTA file for it? Thank you.
UPD: I also wonder if it's possible to tell what are the scaffold names for the HiC maps in the 'Location' field under 'Error adjustment' section.
Thank you for your feedback. First of all, judging from your result report, the effect of ilMicArun2 is relatively good, but it seems that there is something wrong with the final number of chromosomes, and it may need to be manually adjusted according to your actual situation. The result of gfHygPuni3 seems not very good, I don't know why.
I can confirm with the row length 80bp in FASTA files the pipeline works correctly. Thank you for your help.
I've got AutoHiC results for a couple of datasets and would like to access the FASTA file for the assemblies labeled in .html file as 'Before adjustment'. From the code it seems like it should be the 3d-dna assembly with the lowest estimated number of 'Translocation' + 'Inversion' error. I wonder what happens when the corresponding sums are equal at different iterations (but they differ in 'Debris' number)? Also, what would be the best way to extract the FASTA file for it? Thank you.
UPD: I also wonder if it's possible to tell what are the scaffold names for the HiC maps in the 'Location' field under 'Error adjustment' section.
autohic_results
directory.autohic_results
directory. You can refer to this link: https://github.com/Jwindler/AutoHiC/blob/main/example/detail_result.mdThank you for reply. The errors reported in ilMicArun2.result.html are only present in autohic_results/0/inversion_error.json and autohic_results/0/idebris_error.json. However autohic_results/0 doesn't contain any fasta files. Which fasta file should be referred in this case? Thanks.
The first results are in the hic_results/3d-dna
directory, 0, 1 and 2 have no fasta files. AutoHiC will adjust and generate fasta files based on the best results. If you want to get the fasta files of 0,1,2, you can get the x.assembly
file from the hic_results/3d-dna
directory and use the following command to generate the fasta file:
bash run-asm-pipeline-post-review.sh -r adjusted.assembly genome.fasta merged_nodups.txt Please specify the absolute path of each file run-asm-pipeline-post-review.sh in 3d-dna folder adjusted.assembly is output from onehic.py merged_nodups.txt is output from Juicer
Hello,
I've been trying to run AutoHiC for two datasets from scratch. In both cases bwamem, juicer, 3d-dna steps seem to finish correctly, at least they didn't report any critical errors. But on the later stages both runs failed with different errors: The first dataset is for an insect Nudaria mundana. The error is
File /lustre/scratch123/tol/teams/tola/users/kk16/autohic_data/ilNudMud1/result/AutoHiC_ilNudMud1/autohic_results/3/ilNudMud1.final.hic cannot be opened for reading
Indeed there is no such file but the folder contents are
Another one is a high-coverage dataset for a protist Eimeria maxima, where the error is
The log files are attached.
autohic_ilNudMud121106.txt autohic_pxEimMaxi123763.txt
I wonder if it's possible to get some help with troubleshooting?
Many thanks!