Open calocascio opened 3 months ago
Sorry for the delay in getting back to you.
I can see the issue - currently ROBIN expects a reference with chromosome names as chr1 chr2 etc etc. Your reference uses GL000008.2 etc.
You could try using this one:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
or this:
https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/p14/hg38.p14.fa.gz
I am thinking of bundling a reference with the tool goign forwards. The main issue is that your reads will need to be mapped to a reference using the chr1 etc nomenclature.
Hi again,
Thank you for your answer! We tried a new run with the reference genome from the second link you suggested. It worked to generate a report, and I get copy number variation data for chr 1-22 + chr X, but I think the haplotype chromosomes and unplaced/unlocalized contigs are creating error messages. I attach the new log file: IPD0241.log
Thanks again for your support.
I'll have a look at this and get back to you asap.
If you got the report that is excellent!
Hi - can I just check - are you aligning the data to the same reference you are using for ROBIN or are the data aligned to the complete reference?
Hi! We used the hg38.p14.fa for both alignment and when running ROBIN. Sorry, I wasn't aware the patches were not complete. I do see that for example "chr1_GL383518v1_alt" is present in hg38.p14.fa. Which version do you recommend using for alignment?
Hi again, Sorry to add to the issue, but we tried the same again with another sample just to see, and this time it did not work to generate a report. The errors pertaining to that is towards the bottom of this error log: IPD1119.log
It seems the image size is too large. I'm not sure if this is related to the issue with the reference genome or if it's unrelated?
Just let me know if you need any other information!
ValueError: Image size of 521657841x958 pixels is too large. It must be less than 2^16 in each direction. 2024-09-12 11:25:32,404 - nicegui - ERROR - Image size of -1257769606x701 pixels is too large. It must be less than 2^16 in each direction.
Yes! That is a big image.
Can I check which version of ROBIN you are currently running? If you do robin --version what do you get?
Acutally - if you are on any version less than 0.1.0 then you could do with updating. That might not fix the problem, but it would be worth checking!
Ah! I was using 0.0.6. I have now recloned it and have version 0.1.0. I tried running the same as before, but now I get this error: "Error: invalid value for '--bed_file' / '-b': File '/path/to/bed/file' does not exist.
I see now after running robin --help, that there are now many more options where it says "[required]". Most should be fine, but what do I do if I don't have a bed file (we have not run adaptive sampling)?
By the way, there may have been a problem with the index file for the reference genome I was using. I have now recreated the fasta.fai file, so we can see if that helps after this :)
Ah - thats a new feature.
You can pass it the link to this file which will be in the downloaded repository:
src/robin/resources/panel_11092024_5kb_pad.bed
It won't affect anything you do, but ROBIN assumes that these were the targets for some of the analysis steps.
Thank you - that's great! I ran it again like this:
robin --threads 4 -r /path/to/ref -b /path/to/bed --centreID "IPD1119" --basecall_config "guppy" --experiment_duration 72 -w /path/to/bam /path/to/output
Although I think I should have put "dorado" instead of "guppy". However, I didn't get any errors, until I tried to create a report (log file:
IPD1119.log
). I also attach screenshots of the output in case you see anything there.
screenshots.zip
Please let me know if you see anything that looks wrong, and if you have any more thoughts about the correct reference genome to use when creating the BAM files.
Hi - yes you should have used dorado (though at the moment this will not matter at all).
I will try and recreate the report generation error here and see if I can solve it.
Hi - I'm struggling to reproduce the error on the current version of the code.
Could you check that when you type:
robin --version
you see something like:
0.1.0
Assuming that is the case, could you look in the output folder which should be in /path/to/output/
In this folder you should see a subfolder that corresponds to the data set you are analysing. Within that folder should be two files called CNV.npy and CNV_dict.npy
These files contain no sequence data, but do contain a description of the copy number profile. I think it is these files that are causing the issue. Would you be able to share those with me? Then I might be able to track down what is causing the problem.
Thanks.
Matt
Hi, Yes, I am now using version 0.1.0. Sure – here are the two files from this run: IPD1119_CNV.zip
Thank you!
Brilliant - this is very helpful. I've been able to track down at least part of the issue.
Could you try installing the version of robin on the branch I've just pushed.
https://github.com/LooseLab/ROBIN/tree/fix/reporting_multi_chrom
This should enable you to generate a report. If it works I will merge into the main branch.
Thanks - that's great! I tried it out (another sample), and for some reason I can create a report after it has run a little while, but not after it has finished. Since the report is 158 pages and too big to attach here, I'm only adding the first five pages of the report I got. It changed after this though: after it had finished processing all of the BAM files, the estimated coverage was about double, and I got results for nanoDx, for instance. But here is it: IPD0737-DXX-P01-F08_run_report_short.pdf . I also saved all of the output to the terminal, in case that is helpful: IPD0737_stdout.log .
Did you reinstall from the report branch above?
It appears that you may not have done as in your report I can see this:
This is showing that you have reads aligned to alts and unplaced contigs which it is trying to plot. The version of the code on the alternate branch above should (and perhaps it doesn't!) ignore those now.
If the bam files that you have are aligned to a reference that does not include alts and unplaced etc then this shoudln't be happening.
Can you confirm the reference you are using? And also try installing from the reporting_multi_chrom branch as above?
I did the following: git fetch origin
, git checkout fix/reporting_multi_chrom
, git submodule update --init --recursive
. It looks like I am in the right branch when I run git branch
, but let me know if I should have used different code. I am using this reference genome: , both for alignment and when running ROBIN, so maybe I need to use a different one?
Hmm, looks like the link didn't work. This one: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/p14/hg38.p14.fa.gz
Hi,
Can you run:
git checkout fix/reporting_multi_chrom
git pull
pip install -e .
Then run:
robin --version
And hopefully it will show
0.1.0b
If so then you have updated.
Then if you restart robin you should be able to browse to the previous run folder and generate the report from the existing analysis but without the multiple alignments showing (I hope!)
Ok, now it is showing version 0.1.0b. Fantastic – that seemed to have worked! (see CNV screenshot: ). However, I'm sorry, but now there is a new error coming up (log file: IPD0737_run2.log ), stopping the report from being generated. Thank you for all of your help so far!
Hey - we are making progress!
This is good news :-)
I'll have a look at what is causing the date error. That one is odd and I haven't seen it before. I'll get back to you asap.
in the folder with all the results you should have a number of files called somethign followed by _scores.csv.
There could be three or four of these. Would you be able to share them with me?
Thanks.
Great! :) yes - here are the four files zipped: scores_output.zip
Please could you try the latest version on the main branch (should be version 0.1.3). This may well have resolved some of the reporting issues you were seeing.
When running the pipeline, I get methylation classification results in the GUI, but there are several errors in the terminal. CNV data is missing, and I cannot generate a report from the GUI. I have attached the log file NB19-509.log . Please let me know if you need any additional information.
Environment details
Ubuntu 20.04.6
Additional context
The sequence data is from a PromethION 48 machine. The lab followed the ligation protocol, and we have not run adaptive sampling.
I ran the pipeline like this:
robin --threads 4 -r /data/GRCh38.p14.genome.fa -w /path/to/bam_pass /path/to/output
And I opened the URL http://10.54.216.13:8081 and pressed Live data.