etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
541 stars 165 forks source link

Input for the import-rna command #479

Open SJRussell opened 4 years ago

SJRussell commented 4 years ago
  1. It's not clear what Gene Resource data to download from BioMart. I'm using the built in Hg38 gene info as a template but BioMart doesn't have "NCBI gene ID" or "Transcript support level (TSL)" available for Hg19. I'm going to try to get the original fasta files for the project and remap to Hg38, but it would be great to have a brief overview of what "Gene-resource" info is required when working with non-Hg38 genomes.
  2. I'd also like to build my own cnv expression correlates, but the input requirements for cnv_expression_correlate.py are not clear. Could you point me to a resource for building these inputs?
  3. There are several issues/questions about using counts as input for import-rna. Have the issues been fixed or should I run RSEM instead of HTSeq-count to generate my sample input.

Thanks for your great tool and in advance for your help.

etal commented 4 years ago

Thanks for checking in, and sorry for the trouble.

  1. Offhand I'm not sure what to do about BioMart's lack of support for hg19 etc. Do you know of another BioMart source that might have these, or a way to do it in R?
  2. At the moment the code is the spec, and I agree some proper docs are necessary here. (Tagging this ticket.) Some manual wrangling of the tables is needed.
  3. There's a chance it's been fixed -- if it's quick to check, then try it, otherwise RSEM would be a viable workaround.
SJRussell commented 4 years ago

Thanks for the response. For now:

  1. I've mapped to hg38 instead of trying to get the right output from biomart. The difficulty was that I didn't know what fields were required for the gene-resource file.
  2. If you update the documentation to include details on how to build custom expression correlates, please mention in this ticket. It seems to me that the accuracy of the algorithm depends on experiment-specific expression correlates.
  3. Upon installing CNVkit with conda and running any cnvkit.py commands, I got this error: Traceback (most recent call last): File "/home/stewart/anaconda3/envs/cnvkit/bin/cnvkit.py", line 8, in <module> from cnvlib import commands File "/home/stewart/cnvkit/cnvlib/__init__.py", line 4, in <module> from .cmdutil import read_cna as read File "/home/stewart/cnvkit/cnvlib/cmdutil.py", line 7, in <module> from .cnary import CopyNumArray as CNA File "/home/stewart/cnvkit/cnvlib/cnary.py", line 9, in <module> from . import core, descriptives, params, smoothing File "/home/stewart/cnvkit/cnvlib/smoothing.py", line 152 x, wing, *padded = check_inputs(x, width, False, weights) ^ SyntaxError: invalid syntax By reinstalling with pip, the issues seem resolved. I also ran with -f counts and it appears to give log2 values, suggesting that counts from STAR or HTSeq-count can be used.
etal commented 4 years ago

Thanks for the feedback. I'll roll another release for the sake of getting the latest fixes out to the world, and then see about replicating and documenting the process of creating the gene resource and cnv-expression correlates.

SJRussell commented 4 years ago

Much appreciated. Do you have any suggestions for cleaning up the calls I'm getting? So far I've tried specifying normal samples, using --no-txlen, --max-log2 2, and segment -m none. The PDFs I've attached are with 3 normal samples specified, using counts, and with the rest of the parameters default. The total input was 15 RNA seq samples. As you can see, the XY normal sample segments are still quite variable. In the -16 samples, there is a clear decrease in log2 for chrom 16. However in the XO samples there is no clear decrease in XO (this could be due to dosage compensation or the fact that the population contains both XX and XY samples). Any suggestions on how to bring the baseline closer to 0 and reduce variability? Thanks!

normal1.pdf normal2.pdf normal3.pdf XO-1.pdf XO-2.pdf XO-3.pdf minus16-1.pdf minus16-2.pdf minus16-3.pdf plus16-1.pdf