MGI-tech-bioinformatics / DNBelab_C_Series_HT_scRNA-analysis-software

An open source and flexible pipeline to analysis high-throughput DNBelab C Series single-cell RNA datasets
MIT License
72 stars 24 forks source link

Output matrix with gene_id #99

Open ShuyangXu opened 3 months ago

ShuyangXu commented 3 months ago

Hi, MGI dev group,

  1. I noticed you mentioned that the new version can deal with gene_id,

         在版本2.0.0中无法对只包含gene_id而没有gene_name的gene进行注释,但是在版本2.0.5中我们进行了改变,可以更新流程重新进行分析

_Originally posted by @lishuangshuang0616 in https://github.com/MGI-tech-bioinformatics/DNBelab_C_Series_HT_scRNA-analysis-software/issues/8#issuecomment-1250766251_

yet there is no option to do so. (https://github.com/MGI-tech-bioinformatics/DNBelab_C_Series_HT_scRNA-analysis-software/blob/version2.0/doc/scRNA_para.md)

After testing, features.tsv.gz outputs 'gene_id' only when all 'gene_name' attributes are deleted in the GTF. Could you add an option to control gene id/name output mode? or just like cellranger outputs them both?

  1. dnbc4tools rna mkref creates a ref.json file with key chrmt, while the value is set to dnbc4tools rna data --chrMT. But the help messages of below three are duplicated. I can't tell how this value would affect the result.
dnbc4tools rna data -h

...
--genomeDir PATH     Path to the directory where genome files are stored.
--gtf PATH           Path to the directory where genome files are stored.
--chrMT PATH         Path to the directory where genome files are stored.
dnbc4tools -v

2.1.2
lishuangshuang0616 commented 3 months ago

The previous question was about what to do if there is only gene_id but no gene_name, for instance, in NCBI's GTF file where the absence of gene_name tag leads to the inability to annotate genes. We've adjusted it to where if gene_name is missing, gene_id will be used as a substitute. This does not mean that it will generate a three-column file similar to Cell Ranger's feature.tsv.gz. Currently, our software does not support generating a three-column feature.tsv.gz. For reading matrix, please refer to the instructions at the bottom of the quick start guide.

Regarding the second question, to analyze RNA data directly, you can use the commanddnbc4tools rna run. If you need to analyze only a specific step, you can use the --process flag.