File /usr/local/lib/python3.8/dist-packages/scipy/sparse/_data.py:324, in _minmax_mixin.max(self, axis, out)
294 def max(self, axis=None, out=None):
295 """
296 Return the maximum of the matrix or maximum along an axis.
297 This takes all elements into account, not just the non-zero ones.
(...)
322
323 """
--> 324 return self._min_or_max(axis, out, np.maximum)
File /usr/local/lib/python3.8/dist-packages/scipy/sparse/_data.py:214, in _minmax_mixin._min_or_max(self, axis, out, min_or_max)
211 axis += 2
213 if (axis == 0) or (axis == 1):
--> 214 return self._min_or_max_axis(axis, min_or_max)
215 else:
216 raise ValueError("axis out of range")
File /usr/local/lib/python3.8/dist-packages/scipy/sparse/_data.py:166, in _minmax_mixin._min_or_max_axis(self, axis, min_or_max)
164 N = self.shape[axis]
165 if N == 0:
--> 166 raise ValueError("zero-size array to reduction operation")
167 M = self.shape[1 - axis]
169 mat = self.tocsc() if axis == 0 else self.tocsr()
ValueError: zero-size array to reduction operation
SOLUTION:
Basically what is happening is a discrepancy between feature names in your transcriptome file and your annData objects. This happened for me when pulling NCBI RefSeq data and using that both as a reference genome for cellranger's counting software (RefSeq $organism_genomic.fna and $organism_genomic.gtf files) as well as in 'map_genes.sh' (RefSeq $organism_cds_from_genomic.fna files). cellranger's features will be named using the 'geneid' field, but the pairwise tblastx hits will use the first field from the FASTA file's subject header - in the case of NCBI RefSeq data, that's the 'local' field (i.e. 'lcl|$whatever').
This can be solved pretty easily by using gffread to generate a properly formatted transcriptome for input into 'map_genes.sh.' It will look something like
Note: the last sed command is used to convert underscore '_' characters to dashes '-' because I used seurat and sceasy R packages to generate annData '$whatever.h5ad' files, and that process automatically converts underscores to dashes.
ERROR:
SOLUTION:
Basically what is happening is a discrepancy between feature names in your transcriptome file and your annData objects. This happened for me when pulling NCBI RefSeq data and using that both as a reference genome for cellranger's counting software (RefSeq $organism_genomic.fna and $organism_genomic.gtf files) as well as in 'map_genes.sh' (RefSeq $organism_cds_from_genomic.fna files). cellranger's features will be named using the 'geneid' field, but the pairwise tblastx hits will use the first field from the FASTA file's subject header - in the case of NCBI RefSeq data, that's the 'local' field (i.e. 'lcl|$whatever').
This can be solved pretty easily by using gffread to generate a properly formatted transcriptome for input into 'map_genes.sh.' It will look something like
Note: the last sed command is used to convert underscore '_' characters to dashes '-' because I used seurat and sceasy R packages to generate annData '$whatever.h5ad' files, and that process automatically converts underscores to dashes.