Bioconductor / GenomeInfoDb

Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
https://bioconductor.org/packages/GenomeInfoDb
31 stars 13 forks source link

Problem on scaffold name (mm10) #27

Closed mzytnicki closed 3 years ago

mzytnicki commented 3 years ago

There is a seemingly new scaffold on mm10 (Mus musculus), with a "bad" name: chrna_GL456050_alt. So far, you can reproduce the result with:

GenomeInfoDb:::fetch_chrom_sizes_from_UCSC("mm10", "http://hgdownload.cse.ucsc.edu/goldenPath")[179, ]

The following command line gives the same result:

wget http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz

The problem arises, for instance, with the function .order_seqlevels, namely on line 24. The reason is that scaffold names are divided into 3 chunks, separated with _. The first chunk should match a chromosome (actually, an ASSEMBLED_MOLECULES). Obviously, it fails here: chrna is not a chromosome name.

What should I do here?

Thanks a lot!

Baboon61 commented 3 years ago

Related to this, impossible to use Seqinfo(genome="mm10") calling .order_seqlevels

Error in .order_seqlevels(chrom_sizes[, "chrom"]) : !anyNA(m31) is not TRUE

skilpinen commented 3 years ago

I found this as well, and I was in contact with guys at UCSC. Here is their understanding what has happened and how to fix this:


Thank you for bringing this issue to our attention that the GenomeInfoDb tool in R can no longer communicate in terms of mm10.

We recently released a patch update to mm10 with a new approach where patch sequences can be seen within the context of the mm10 assembly. This specific sequence can be seen with this link: http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm10&position=chrna_GL456050_alt

With these new patch sequences in place, like one named chr9_KQ030490_fix, in mm10 now users can click the "multi-region" button while browsing and then click in the box below "Show one alternate haplotype or fix patch, placed on its chromosome, using ID:" and paste that sequence name in and click submit to see these new patches in their respective placement on the patched chromosome: http://genome.ucsc.edu/s/brianlee/fix_in_place

These patch sequence releases were not intended to impact users of the mm10 browser, unless they went to seek these sequences. So it is useful to learn about the impact on GenomeInfoDb.

From the Bioconductor GenomeInfoDb git hub ticket (https://github.com/Bioconductor/GenomeInfoDb/issues/27) it looks as though the program is pulling the data from this location: http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz Data in the /database/ folder is automatically generated from source tables.

One option for the Bioconductor GenomeInfoDb tool is to use an earlier version of a similar file, https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.chrom.sizes, that will not have the _fix sequences or this specific chrna_GL456050_alt sequence, but this would require GenomeInfoDb to make these changes (also the mm10.chrom.sizes file is slightly different from the chromInfo.txt.gz in that chromInfo.txt.gz is gzipped and has one additional column explaining the data was generated from, in this case the updated "/gbdb/mm10/mm10.2bit" file allowing for these new patch sequences to be interactively viewed on the mm10 browser.

Another option is for them to modify the tool to handle these new _fix sequences and unexpected content like chrna_GL456050_alt.

skilpinen commented 3 years ago

And in addition of that, I managed to go around of this problem in my current project with following: Case 1: Previously : seqlevelsStyle(annotations) <- 'UCSC'

Now: ucsc.levels <- str_replace(string=paste("chr",seqlevels(annotations),sep=""), pattern="chrMT", replacement="chrM") seqlevels(annotations) <- ucsc.levels

And case 2: Instead of calling Seqinfo(genome="mm10) I call seqinfo(BSgenome.Mmusculus.UCSC.mm10) which seems to return object equivalent for my purpose.

I am fully aware that these are not real fixes, but helped me in my current project to move forward despite this complication with UCSC services.

hpages commented 3 years ago

Not sure where the UCSC folks are going with this but it doesn't look good.

  1. The Gateway page for mm10 explicitly says:

    Note that the UCSC mm10 database contains only the reference strain C57BL/6J.

    And this was the case so far with 66 sequences in the mm10 genome (65 from the C57BL/6J assembly unit + chrM from the non-nuclear assembly unit). All the new sequences they recently added to chromInfo.txt.gz in the mm10 database are from other strains e.g. NOD/MrkTac for chrna_GL456050_alt, NOD/ShiLtJ for chr6_GL456054_alt, etc... So at the very least, the Gateway page should clarify what this new mm10 genome is made of.

  2. Let's say they suddenly decide that mm10 should be based on a mix of strains, then why weren't the files under https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/ updated to reflect that? In particular, why does mm10.2bit still contain the 66 original genomic sequences and where are all the new genomic sequences? The fact that chromInfo.txt.gz now contains sequences for which the underlying genomic sequence is unknown is a situation never seen before for any other UCSC genomes. Very puzzling and confusing.

maximilianh commented 3 years ago

Not sure where the UCSC folks are going with this but it doesn't look good.

UCSC here. This is not our doing but an update by the genome reference consortium group. One can discuss the utility of patches, but from what I know we're just following common practice of the genomics databases. We're interested in feedback on patch updates in general on this and other genomes. As always, with research tools, it is not easy to predict what they are being used for and if something that is available in the genome is a real user need. We discussed this in the group for a long time and do not know how we could poll our users, but decided to add the patches so users at least have the possibility of using the patch sequences. We did this years ago for hg38 and it lead to few problems and I think did not break the R packages.

All the new sequences they recently added to chromInfo.txt.gz in the mm10 database are from other strains e.g. NOD/MrkTac for chrna_GL456050_alt, NOD/ShiLtJ for chr6_GL456054_alt, etc... So at the very least, the Gateway page should clarify what this new mm10 genome is made of.

Thank you! This page slipped through. We'll update this page this week.

why weren't the files under https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/ updated

This was a long discussion in the group. There is no perfect solution: stability versus updates. We did not update our existing, old genome fasta files because we know of software pipelines that start with a "wget <ucscFastaUrl>". There are biostars discussions and blog posts that discuss genome versions and mention this URL. They would all instantly be broken if we change these fasta files (a random example is https://gatk.broadinstitute.org/hc/en-us/articles/360035890711-GRCh37-hg19-b37-humanG1Kv37-Human-Reference-Discrepancies, but there are others). Changing our database tables also risked breaking third party software (unfortunately, as we see here), but we thought that that would be less common than if we change these fasta files.

So we decided instead to not update the existing files and instead put the new fasta files into https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/latest/. This is the same approach as with hg38 a few years ago. It lead to few problems.

However, reading the mm10 README, I totally agree that the documentation could highlight much more the new patches. We pushed the page updates a few days ago, we'll update it now.

I really appreciate your feedback on this issue, as you can see, we try to keep things stable and we're sorry for the trouble this is causing here. If you have other improvement ideas, or ideas on how to deal with patches in the future, do not hesitate to let us know.

hpages commented 3 years ago

Thanks for the clarifications. So it seems that what you did was to "resync" mm10 with GRCm38.p6 (was previously based on GRCm38). That's all we needed to know. Note that I didn't find this information anywhere until now (just found it here) and it's not something that can easily be guessed either.

When hg19 was "resynced" with GRCh37.p13 last year (after being based on GRCh37 for more than 10 years) it also broke a few things on the Bioconductor side and created some confusion (especially with the introduction of a 2nd mitochondrial sequence), but at least the change was documented in the README file at https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/ so we knew what was going on and how to fix.

Now with the information that the new mm10 is based on GRCm38.p6, we should be able to do something.

hpages commented 3 years ago

Looks like this update is not just a switch to GRCm38.p6, it also extends the scope of mm10 to include all assembly units from GRCm38.p6. The old mm10 was restricted to assembly units "C57BL/6J" and "non-nuclear".

I'll stop doing guess work. Hopefully the Gateway page and the README file at https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/ will explain what really happened to mm10. Thanks!

maximilianh commented 3 years ago

I see. If hg19 broke Bioconductor, sorry, I misremembered. We should have given you a heads up. And will do if we ever change an assembly again (like everyone else, we feel uneasy about changes to existing assemblies)

As for the README, it appears that the updated mm10 file was written including a long section about the patches, but hasn't been pushed yet. Oops. We'll push it out today.

As for the documentation on which assembly units were included, good point, we'll clarify that.

hpages commented 3 years ago

Also one mystery is where NCBI is getting the UCSC-style-name from. According to the Full sequence report for GRCm38.p6, the UCSC name for MG132_PATCH (RefSeq accession NW_004450259.3) is expected to be chr9_KB469738v3_fix but AFAICT it's chr9_KB469738_fix.

Is such mapping available somewhere in the mm10 database? The fact that there's no easy/reliable way to map UCSC sequence names to their corresponding NCBI name has been a recurring issue for many years. Thanks!

maximilianh commented 3 years ago

Hi Herve,

Excellent question. I'll forward this to NCBI. We don't know where NCBI gets their UCSC-style-name from, it looks like they create these themselves. I hope they can update this file.

Yes, this is an old problem. To address it, since around 5 years ago, we provide a mapping in the table seqAlias (we will update the hg19/hg38/mm10 README files today to point to these tables, thank you for bringing this up. We're often unsure where to document new tables that are not visible in the user interface):

MariaDB [mm10]> select * from chromAlias limit 10; +----------------------+--------------------+----------+ | alias | chrom | source | +----------------------+--------------------+----------+ | 1 | chr1 | assembly | | CM000994.2 | chr1 | genbank | | NC_000067.6 | chr1 | refseq | | 10 | chr10 | assembly | | CM001003.2 | chr10 | genbank | | NC_000076.6 | chr10 | refseq | | GL456015.1 | chr10_GL456015_alt | genbank | | NT_078651.1 | chr10_GL456015_alt | refseq | | UNKNOWN_MMCHR10_CTG3 | chr10_GL456015_alt | assembly | | KQ030491.1 | chr10_KQ030491_fix | genbank | +----------------------+--------------------+----------+ 10 rows in set (0.00 sec)

The table is available through the table browser and for download (not yet through the API, I believe, but that would be easy to add if you would like to obtain it via the API):

The table is used when you enter coordinates and upload custom tracks, most formats now accept NCBI and Ensembl identifiers.

curl https://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromAlias.txt.gz | zcat

again, thanks for these excellent questions.

On Wed, Jun 30, 2021 at 8:38 PM Hervé Pagès @.***> wrote:

Also one mystery is where NCBI is getting the UCSC-style-name from. According to the Full sequence report for GRCm38.p6 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/GCF_000001635.26_GRCm38.p6/GCF_000001635.26_GRCm38.p6_assembly_report.txt, the UCSC name for MG132_PATCH (RefSeq accession NW_004450259.3) is expected to be chr9_KB469738v3_fix but AFAICT it's chr9_KB469738_fix.

Is such mapping available somewhere in the mm10 database? The fact that there's no easy/reliable way to map UCSC sequence names to their corresponding NCBI name has been a recurring issue for many years. Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/GenomeInfoDb/issues/27#issuecomment-871639449, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TMIA6UXKDSUUL457TLTVNQDJANCNFSM47PUNQCQ .

hpages commented 3 years ago

Hi Maximilian,

Thanks for the chromAlias reminder. I think I remember stumbling on that table at some point. At the time I put on our TODO list to maybe start using it instead of the complicated heuristic that we're using to map UCSC sequence names to the NCBI names. Although I was not sure it was worth it.

The thing is, we need to be able to solve the mapping problem for a bunch of UCSC genomes (see list here). Some of them are quite old and many of them don't provide the chromAlias table. However, it turns out that our complicated heuristic works for all of them, including for those genomes that do not provide the chromAlias table. So there was little incentive for us to implement an alternate resolution mechanism. And even if we were doing so, we would still need to maintain the code behind the complicated heuristic. If chromAlias was available for all the UCSC genomes that are based on an NCBI assembly, it would be a different story.

H.

jardplard commented 3 years ago

Hello, I am so sorry if this is off-topic or poorly asked (R novice here), but the org.Mm.eg.db package has also become largely problematic in the last week and has broken some aspects of my analysis pipeline. In seeing all of this discussion, I assume that these issues are related and wanted to bring it to people's attention, but please forgive me if I am incorrect about this.

hpages commented 3 years ago

@jardplard It would be surprising if this was related with the mm10 issue since org.Mm.eg.db doesn't have much to do with mm10. The only thing they have in common is that both are about Mouse but that's pretty much it. Anyway it's hard to know for sure since you're not telling us what problem you are seeing with org.Mm.eg.db. Can you be more specific? In particular it would help if you could provide a minimal self-contained reproducible example followed by your sessionInfo(). Thanks!

maximilianh commented 3 years ago

@hpages Hi Hervé, I quickly looked over your heuristics. I'm impressed and also a little sad that you had to spend so much effort on mapping sequence names. I can say that we are providing chromAlias tables for all new genomes. I will enquire now if we can make them for all old genomes.

And, just for me to understand the context better: I wonder what the reason is that you need NCBI/UCSC mapping, especially a long time ago. Is the use case that Bioconductor users download NCBI Refseq annotations in GFF format and also pull in annotations from UCSC for the same sequence? But these annotations are pretty recent. Was the typical use case something else?

mzytnicki commented 3 years ago

@maximilianh I am sorry to barge in. In am the one who opened the issue. To answer you, yes, we often need to translate UCSC/NCBI ref names. Very frequently, and it is curse. A simple use case: we get sequencing data, which have been mapped to the NCBI references (by the company who did the sequencing, for instance). Now, I want to compare the reads with some annotation, and I use the dedicated package, e.g. TxDb.Mmusculus.UCSC.mm10.knownGene. I need to translate the references. Note that this can be true for "old" genomes, if I reuse data produced long ago. In general, some packages silently convert references to UCSC, especially to be consistent, and use dedicated databases. So, yes, this is a keystone operation.

maximilianh commented 3 years ago

@mzytnicki Thanks, this makes complete sense, I should have thought of this.

@hpages: We updated https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/ now and added an explanation of the strain patches and also point to the chromAlias table. The mm10 genome description page is still being modified.

hpages commented 3 years ago

@maximilianh Thanks for the updates.

Yes, as @mzytnicki said, people use this all the time. Every time they need to compare 2 objects that use different sequence naming conventions (e.g. a BAM file with NCBI sequence names and a set of UCSC transcripts in a TxDb object), they need to harmonize the names by switching one object to the convention of the other. The central tool for this is GenomeInfoDb::seqlevelsStyle() e.g.:

library(TxDb.Hsapiens.UCSC.hg18.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene

seqinfo(txdb)
# Seqinfo object with 49 sequences (1 circular) from hg18 genome:
#   seqnames      seqlengths isCircular genome
#   chr1           247249719       <NA>   hg18
#   chr2           242951149       <NA>   hg18
#   chr3           199501827       <NA>   hg18
#   chr4           191273063       <NA>   hg18
#   chr5           180857866       <NA>   hg18
#   ...                  ...        ...    ...
#   chr19_random      301858       <NA>   hg18
#   chr21_random     1679693       <NA>   hg18
#   chr22_h2_hap1      63661       <NA>   hg18
#   chr22_random      257318       <NA>   hg18
#   chrX_random      1719168       <NA>   hg18

seqlevelsStyle(txdb) <- "NCBI"
# Warning message:
# In (function (seqlevels, genome, new_style)  :
#   cannot switch some of hg18's seqlevels from UCSC to NCBI style

seqinfo(txdb)
# Seqinfo object with 49 sequences (1 circular) from 2 genomes (NCBI36, hg18):
#   seqnames       seqlengths isCircular genome
#   1               247249719       <NA> NCBI36
#   2               242951149       <NA> NCBI36
#   3               199501827       <NA> NCBI36
#   4               191273063       <NA> NCBI36
#   5               180857866       <NA> NCBI36
#   ...                   ...        ...    ...
#   chr19_random       301858       <NA>   hg18
#   chr21_random      1679693       <NA>   hg18
#   Hs22_111678_36      63661       <NA> NCBI36
#   chr22_random       257318       <NA>   hg18
#   chrX_random       1719168       <NA>   hg18

table(genome(seqinfo(txdb)))
# 
#   hg18 NCBI36 
#     23     26 

Same thing if they need to switch the naming style of a BSgenome object:

library(BSgenome.Hsapiens.UCSC.hg18)
genome <- BSgenome.Hsapiens.UCSC.hg18

seqinfo(genome)
# Seqinfo object with 49 sequences (1 circular) from hg18 genome:
#   seqnames     seqlengths isCircular genome
#   chr1          247249719      FALSE   hg18
#   chr2          242951149      FALSE   hg18
#   chr3          199501827      FALSE   hg18
#   chr4          191273063      FALSE   hg18
#   chr5          180857866      FALSE   hg18
#   ...                 ...        ...    ...
#   chr18_random       4262      FALSE   hg18
#   chr19_random     301858      FALSE   hg18
#   chr21_random    1679693      FALSE   hg18
#   chr22_random     257318      FALSE   hg18
#   chrX_random     1719168      FALSE   hg18

seqlevelsStyle(genome) <- "NCBI"
# Warning message:
# In (function (seqlevels, genome, new_style)  :
#   cannot switch some of hg18's seqlevels from UCSC to NCBI style

seqinfo(genome)
# Seqinfo object with 49 sequences (1 circular) from 2 genomes (NCBI36, hg18):
#   seqnames     seqlengths isCircular genome
#   1             247249719      FALSE NCBI36
#   2             242951149      FALSE NCBI36
#   3             199501827      FALSE NCBI36
#   4             191273063      FALSE NCBI36
#   5             180857866      FALSE NCBI36
#   ...                 ...        ...    ...
#   chr18_random       4262      FALSE   hg18
#   chr19_random     301858      FALSE   hg18
#   chr21_random    1679693      FALSE   hg18
#   chr22_random     257318      FALSE   hg18
#   chrX_random     1719168      FALSE   hg18

H.

hpages commented 3 years ago

@mzytnicki I resynced mm10 with GRCm38.p6 in GenomeInfoDb 1.29.3 (BioC 3.14, devel) yesterday. See commit a92e3979526f18812b5e6448d36271ab69aac690. GenomeInfoDb 1.29.3 successfully passed R CMD build and R CMD check on all platforms on today's build report and will become available to BioC 3.14 users in a couple of hours via BiocManager::install().

I've also pushed the fix in release (BioC 3.13) in GenomeInfoDb 1.28.1. It will become available to BioC 3.13 users later today.

maximilianh commented 3 years ago

The following assemblies do not have a chromAlias table:

anoCar1 anoGam1 apiMel1 apiMel2 apiMel3 aplCal1 aptMan1 bosTau1 bosTau2 bosTau3 bosTau4 bosTau5 bosTauMd3 braFlo1 caeJap1 caePb1 caePb2 caeRem2 caeRem3 calJac1 calJac4 canFam1 canFam2 casCan1 cavPor2 cb1 cb3 cbJul2002 ce10 ce2 ce4 ce6 chrPic1 ci2 criGri1 danRer1 danRer2 danRer3 danRer4 danRer5 danRer6 dasNov1 dm1 dm2 dm3 dp2 dp3 dp4 droAna1 droAna2 droEre1 droGri1 droMoj1 droMoj2 droSim1 droVir1 droVir2 droWil1 droYak1 droYak2 eboVir3 equCab1 felCat3 felCat4 fr1 galGal2 galGal3 gorGor5 hg16 hg17 hg18 hg24may2000 hg4 hg5 hg6 hg7 hg8 loxAfr1 mm1 mm2 mm3 mm4 mm5 mm6 mm7 mm8 mm9 monDom1 monDom2 monDom4 nasLar1 nomLeu2 oryCun1 otoGar1 oviAri1 panTro1 panTro2 papHam1 petMar1 priPac1 rheMac1 rheMac2 rheMac3 rn1 rn2 rn3 rn4 rn7 rnJan2003 rnJun2003 sacCer1 sacCer2 sc1 scApr2003 strPur1 strPur2 susScr2 tetNig1 triCas2 turTru2 venter1 vicPac2 xenTro1 xenTro2

Most of them are very old, before 2010. The reason why they don't have a chromAlias table for these is that we didn't get the sequence from NCBI so this means that we don't have a GCA or GCF assembly to compare to.

If you spot a particular one that we should investigate, we can have a look but I think we agree that providing chromAlias for all assemblies is not possible.

jardplard commented 3 years ago

@hpages that is useful to know, thank you. The reason I suspected these issues were related was that calling basic info for org.Mm.eg.db shows some relation to UCSC ("GPSOURCENAME", see screenshot.)

In my specific example, I am attempting to annotate ATAC-Seq peaks (and read counts for samples at each peak) with the nearest known gene locus, using a package called ChIPpeakAnno. This process involves both EnsDb.Mmusculus.v79 and org.Mm.eg.db. This code has worked perfectly for me multiple times in the past month other than now.

library(ChIPpeakAnno)
library("EnsDb.Mmusculus.v79")
library(org.Mm.eg.db)

#import counts data, convert to a GRanges object
rawCounts <- read.csv('CD8CountsFin.csv', header=1)
geneCounts <- makeGRangesFromDataFrame(rawCounts, keep.extra.columns=TRUE, ignore.strand=TRUE, seqinfo=NULL, seqnames.field=c("chr"), start.field=c("start"), end.field=c("end"), strand.field="strand", starts.in.df.are.0based=FALSE)

#import annotation data (this used to be taken straight from EnsDb.Mmusculus.v79, but I am using 
#csv data since this is down at the moment
annoDatacsv <- read.csv('annocsv2.csv', header=1)
annoData <- makeGRangesFromDataFrame(annoDatacsv)

#match seqlevel style and annotate peaks
seqlevelsStyle(geneCounts) <- seqlevelsStyle(annoData)
countsAnno <- annotatePeakInBatch(geneCounts, AnnotationData=annoData)

#Add ensembl gene ID and symbol
countsAnno <- addGeneIDs(countsAnno, orgAnn="org.Mm.eg.db", 
                  feature_id_type="ensembl_gene_id",
                   IDs2Add=c("symbol"))
#the above function returns "Error: No entrez identifier can be mapped by input data based on the feature_id_type. Please #consider to use correct feature_id_type, orgAnn or annotatedPeak

And my sessionInfo:

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gplots_3.1.1                ggplot2_3.3.5               DESeq2_1.30.1              
 [4] SummarizedExperiment_1.20.0 MatrixGenerics_1.2.1        matrixStats_0.59.0         
 [7] Repitools_1.36.0            org.Mm.eg.db_3.12.0         EnsDb.Mmusculus.v79_2.99.0 
[10] ensembldb_2.14.1            AnnotationFilter_1.14.0     GenomicFeatures_1.42.3     
[13] AnnotationDbi_1.52.0        Biobase_2.50.0              ChIPpeakAnno_3.24.2        
[16] GenomicRanges_1.42.0        GenomeInfoDb_1.26.7         IRanges_2.24.1             
[19] S4Vectors_0.28.1            BiocGenerics_0.36.1         tidyr_1.1.3                
[22] dplyr_1.0.7                 biomaRt_2.46.3             

loaded via a namespace (and not attached):
  [1] colorspace_2.0-2         ellipsis_0.3.2           DNAcopy_1.64.0           futile.logger_1.4.3     
  [5] XVector_0.30.0           rstudioapi_0.13          affyio_1.60.0            bit64_4.0.5             
  [9] fansi_0.5.0              xml2_1.3.2               splines_4.0.3            cachem_1.0.5            
 [13] geneplotter_1.68.0       knitr_1.33               Rsamtools_2.6.0          annotate_1.68.0         
 [17] cluster_2.1.2            vsn_3.58.0               dbplyr_2.1.1             png_0.1-7               
 [21] graph_1.68.0             BiocManager_1.30.16      compiler_4.0.3           httr_1.4.2              
 [25] assertthat_0.2.1         Matrix_1.3-4             fastmap_1.1.0            lazyeval_0.2.2          
 [29] limma_3.46.0             formatR_1.11             prettyunits_1.1.1        tools_4.0.3             
 [33] gtable_0.3.0             glue_1.4.2               GenomeInfoDbData_1.2.4   affy_1.68.0             
 [37] rappdirs_0.3.3           Rcpp_1.0.6               vctrs_0.3.8              Biostrings_2.58.0       
 [41] multtest_2.46.0          preprocessCore_1.52.1    rtracklayer_1.50.0       xfun_0.24               
 [45] stringr_1.4.0            lifecycle_1.0.0          gtools_3.9.2             XML_3.99-0.6            
 [49] edgeR_3.32.1             zlibbioc_1.36.0          MASS_7.3-54              scales_1.1.1            
 [53] BSgenome_1.58.0          hms_1.1.0                ProtGenerics_1.22.0      RBGL_1.66.0             
 [57] lambda.r_1.2.4           RColorBrewer_1.1-2       yaml_2.2.1               curl_4.3.2              
 [61] memoise_2.0.0            stringi_1.6.2            RSQLite_2.2.7            genefilter_1.72.1       
 [65] caTools_1.18.2           Ringo_1.54.0             BiocParallel_1.24.1      truncnorm_1.0-8         
 [69] rlang_0.4.11             pkgconfig_2.0.3          bitops_1.0-7             Rsolnp_1.16             
 [73] lattice_0.20-44          purrr_0.3.4              GenomicAlignments_1.26.0 bit_4.0.4               
 [77] tidyselect_1.1.1         magrittr_2.0.1           R6_2.5.0                 generics_0.1.0          
 [81] DelayedArray_0.16.3      DBI_1.1.1                withr_2.4.2              gsmoothr_0.1.7          
 [85] pillar_1.6.1             survival_3.2-11          KEGGREST_1.30.1          RCurl_1.98-1.3          
 [89] tibble_3.1.2             crayon_1.4.1             futile.options_1.0.1     KernSmooth_2.23-20      
 [93] utf8_1.2.1               BiocFileCache_1.14.0     progress_1.2.2           locfit_1.5-9.4          
 [97] grid_4.0.3               blob_1.2.1               xtable_1.8-4             VennDiagram_1.6.20      
[101] regioneR_1.22.0          openssl_1.4.4            munsell_0.5.0            askpass_1.1  

annocsv2.csv CD8CountsFin.csv

Screen Shot 2021-07-02 at 6 10 26 PM
mzytnicki commented 3 years ago

I will close the issue for now. It is possible that it triggered another error somewhere else, but it is another story... Thanks to everyone!