Closed ttab963 closed 3 years ago
Hi Paek,
I think there are a few paths that are incorrect. The -i option needs to be a full path to scsnv_index folder (that you indexed with scsnv index). The other issue I see is that the -g option should be to an index of the whole genome you used to generate the scsnv_index, ie. bwa index /path/to/genome.fa and the -g option would then be /path/to/genome.fa
Finally the -c option is meant to be a basic text file that is used to quantify specific groups of genes such as mitochondrial genes. It is entirely optional and you can leave it out. An example would be something like:
gene_id<tab>group
ENSG00000211459 MT
scsnv map -l V2 -i /data/users/sjp/Projects/scLUPUS/scsnv_index/scsnv_index -g /path/to/genome.fa -b /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/barcode -t 24 --bam-write 4 -q 4 -o /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2 /data/rawData/scLUPUS/fastq/OM-HL-007-UC2
Thanks for your kind answer, but another error comes again. I change command like this,
scsnv map -l V2 -i /data/users/sjp/Projects/scLUPUS/scsnv_index/scsnv_index -g /data/ref/tenX/refdata-gex-GRCh38-2020-A/fasta/genome.fa \ -b /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/barcode -t 24 --bam-write 4 -q 4 \ -o /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2 \ /data/rawData/scLUPUS/fastq/OM-HL-007-UC2
But somethings wrong.
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[E::bwa_idx_load_from_disk] fail to locate the index files
[00:00:00] Loaded 222696 known barcodes from /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/barcode_counts.txt.gz
Index load failed`
I'm totally not friendly with those bwa files, but i think there are no problem(.sa, .amb, .ann, .pac, .bwt).
Thanks again, Paek
It seems like the tool can't find one of the BWA indices. I also noticed that you mentioned you are using a 5p library. For most of the commands you need to specify the library type with -l V3_5P
or -l V2_5p
depending on which library you are using.
Can you verify the contents of the folder `/data/users/sjp/Projects/scLUPUS/scsnv_index/
It should have these files:
scsnv_index_bwa.amb
scsnv_index_bwa.ann
scsnv_index_bwa.bwt
scsnv_index_bwa.pac
scsnv_index_bwa.sa
scsnv_index_genes.txt.gz
scsnv_index_lenghts.txt
scsnv_index_transcripts.fa.gz
scsnv_index_transcripts.txt.gz
The other folder /data/ref/tenX/refdata-gex-GRCh38-2020-A/fasta/
should have:
genome.fa
genome.fa.amb
genome.fa.ann
genome.fa.bwt
genome.fa.pac
genome.fa.sa
Edit:
I have updated the documentation to reflect the library settings and gene_groups file.
It works so good! Thanks again for your response!!
Hello again.
Sorry for reopen, but scsnvmisc cells makes another problems. I made summary.h5 file and it contained like this.
HDF5 "/data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/OM-HL-007-UC2.summary.h5" {
GROUP "/" {
DATASET "barcode_ids" {
DATATYPE H5T_STD_U32LE
DATASPACE SIMPLE { ( 222696 ) / ( 222696 ) }
DATA {
(0): 177804, 142753, 6495, 14469, 104697, 44792, 14336, 50210, 151124,
(9): 150376, 207310, 94455, 57673, 183673, 48768, 125208, 26413, 89443,
.....
and run command is
scsnvmisc cells -o /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/ /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/OM-HL-007-UC2.summary.h5
but error message came like this.
Traceback (most recent call last):
File "/data/anaconda3/bin/scsnvmisc", line 4, in <module>
__import__('pkg_resources').run_script('scsnvpy==1.0', 'scsnvmisc')
File "/data/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 665, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/data/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1463, in run_script
exec(code, namespace, namespace)
File "/data/anaconda3/lib/python3.8/site-packages/scsnvpy-1.0-py3.8-linux-x86_64.egg/EGG-INFO/scripts/scsnvmisc", line 37, in <module>
cmds.COMMANDS[cmd](args)
File "/data/anaconda3/lib/python3.8/site-packages/scsnvpy-1.0-py3.8-linux-x86_64.egg/scsnvpy/cells.py", line 201, in cells_cmd
krates = brates[brates.molecules >= 1].copy()
File "/data/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5478, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'molecules'
I'm sorry to ask about too minor things, but this tool is so fascinating to give up.
Thanks.
No need to apologize. I want to make sure the tool works for everyone who wishes to use it. It's a strange error as that field should always be available. Can you run the command below and let me know if you get a similar output as to what I have included below?
h5ls /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/OM-HL-007-UC2.summary.h5/barcode_rates
When I try it on a file I have processed I see:
align_ambiguous Dataset {521359}
align_antisense Dataset {521359}
align_barcode_corrected Dataset {521359}
align_cdna Dataset {521359}
align_intergenic Dataset {521359}
align_intronic Dataset {521359}
align_multimapped Dataset {521359}
align_tag_qa_fail Dataset {521359}
align_total_reads Dataset {521359}
align_umi_qa_fail Dataset {521359}
align_unmapped Dataset {521359}
discarded_reads Dataset {521359}
field_order Dataset {23}
genes Dataset {521359}
group_MT Dataset {521359}
intron_molecules Dataset {521359}
intron_pcr_dups Dataset {521359}
intron_reads Dataset {521359}
molecules Dataset {521359}
pcr_dups Dataset {521359}
reads Dataset {521359}
Thanks, but it looks similar with yours. 'h5ls /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/OM-HL-007-UC2.summary.h5/' is this.
barcode_ids Dataset {222696}
barcode_rates Group
barcodes Dataset {222696}
exonic Group
gene_ids Dataset {23713}
gene_names Dataset {23713}
intronic Group
and 'h5ls /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/OM-HL-007-UC2.summary.h5/barcode_rates' is this.
align_ambiguous Dataset {222696}
align_antisense Dataset {222696}
align_barcode_corrected Dataset {222696}
align_cdna Dataset {222696}
align_intergenic Dataset {222696}
align_intronic Dataset {222696}
align_multimapped Dataset {222696}
align_tag_qa_fail Dataset {222696}
align_total_reads Dataset {222696}
align_umi_qa_fail Dataset {222696}
align_unmapped Dataset {222696}
discarded_reads Dataset {222696}
field_order Dataset {19}
genes Dataset {222696}
intron_molecules Dataset {222696}
intron_pcr_dups Dataset {222696}
intron_reads Dataset {222696}
molecules Dataset {222696}
pcr_dups Dataset {222696}
reads Dataset {222696}
I check this file in python, and barcode_rates/molecules has data like [46993 39395 36750 ... 0 0 0]. Could it be a HDF5 C and C++ libraries problem?
Thanks for your patience and help figuring out this issue.
I suspect it may be a string encoding issue with python. Pandas has issues with columns named after byte strings and I think on your system the strings are bytes. I pushed a few changes to the code to resolve this. If you can clone the repository and compile the cmake and python code again, it should hopefully resolve the issue.
Sorry for late. I had to change my envs for other reasons, and recompile your code. And unfortunately, there's a another issue but it seems to get problem with HDF5.
cmake -DCMAKE_BUILD_TYPE=Release ..
This command goes wrong.
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /data/anaconda3/bin/x86_64-conda-linux-gnu-cc
-- Check for working C compiler: /data/anaconda3/bin/x86_64-conda-linux-gnu-cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /data/anaconda3/bin/x86_64-conda-linux-gnu-c++
-- Check for working CXX compiler: /data/anaconda3/bin/x86_64-conda-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- HDF5: Using hdf5 compiler wrapper to determine C configuration
-- HDF5: Using hdf5 compiler wrapper to determine CXX configuration
CMake Error at /usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message):
Could NOT find HDF5 (missing: HDF5_HL_LIBRARIES) (found version "1.10.6")
Call Stack (most recent call first):
/usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:393 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.16/Modules/FindHDF5.cmake:929 (find_package_handle_standard_args)
CMakeLists.txt:14 (find_package)
-- Configuring incomplete, errors occurred!
See also "/data/users/sjp/programs/scsnv/build/CMakeFiles/CMakeOutput.log".
See also "/data/users/sjp/programs/scsnv/build/CMakeFiles/CMakeError.log".
(base) sjp@qlab:/data/users/sjp/programs/scsnv/build$ make
make: *** No targets specified and no makefile found. Stop.
I used HDF5 in conda this time. Should I have to use other ways? this cmake way is so difficult to me ...
Thanks again anyway.
There seems to be some issues when trying to compile in a conda environment. I will try to find a work around and let you know.
I have cleaned up the build process and have successfully compiled the main program and the python library in a conda environment. I would delete the scsnv directory and clone it again.
I created the environment with the following command:
conda create -c conda-forge -c bioconda -n SCSNV_TEST gcc_linux-64 gxx_linux-64 cmake hdf5 zlib
Hopefully this resolves your build issues.
Okay, compiling is fine and error that i mention before is passed.
but I realize my summary.h5 file doesn't have "group_MT" and another error comes out actually.
My command is,
scsnvmisc cells -o /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/ \ /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/OM-HL-007-UC2.summary.h5'
/data/anaconda3/lib/python3.8/site-packages/scsnvpy-1.0-py3.8-linux-x86_64.egg/scsnvpy/cells.py:237: MatplotlibDeprecationWarning: The 'subsx' parameter of __init__() has been renamed 'subs' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
ax.set_xscale('log', subsx=[2,3,4,5,6,7,8,9])
Traceback (most recent call last):
File "/data/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'group_MT'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/anaconda3/bin/scsnvmisc", line 4, in <module>
__import__('pkg_resources').run_script('scsnvpy==1.0', 'scsnvmisc')
File "/data/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 651, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/data/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1448, in run_script
exec(code, namespace, namespace)
File "/data/anaconda3/lib/python3.8/site-packages/scsnvpy-1.0-py3.8-linux-x86_64.egg/EGG-INFO/scripts/scsnvmisc", line 37, in <module>
cmds.COMMANDS[cmd](args)
File "/data/anaconda3/lib/python3.8/site-packages/scsnvpy-1.0-py3.8-linux-x86_64.egg/scsnvpy/cells.py", line 244, in cells_cmd
MT_cutoff(axs[1], prates, args.mt_key, args.mt_mad, args.mt_max, args.mt_perc)
File "/data/anaconda3/lib/python3.8/site-packages/scsnvpy-1.0-py3.8-linux-x86_64.egg/scsnvpy/cells.py", line 99, in MT_cutoff
pmt = 100.0 * df[key].values / mols
File "/data/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 3455, in __getitem__
indexer = self.columns.get_loc(key)
File "/data/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'group_MT'
There is no error message in previous steps. If this error is because mitocondrial reads were not counted, which point do i wrong? My reference is h38 from 10x.
I'm sorry to bother you this long. Thanks again.
I have added an additional flag to the cell command to skip filtering by MT(%) if you did not specify a groups file with that information.
You can run it with
scsnvmisc cells --skip-mt -o /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/ /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/OM-HL-007-UC2.summary.h5'
Hopefully that resolves that issue.
Finally It works well from beginning to end! Thank you, GWW.
Great I am so glad it has all worked out. Thanks for your patience with all of the issues.
Hello,
I'm trying to scsnv map with 10x 5' V2 data, but it can't read scsnv_index_genes.txt.gz file. This is Run commend & error messages.
scsnv map -l V2 -i scsnv_index -g /data/users/sjp/Projects/scLUPUS/scsnv_index/scsnv_index_bwa.bwt \ -b /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/barcode -t 24 --bam-write 4 -q 4 \ -o /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2 \ -c /data/users/sjp/Projects/scLUPUS/scsnv_index/scsnv_index_genes.txt.gz \ /data/rawData/scLUPUS/fastq/OM-HL-007-UC2
#[00:00:00] Loaded 222696 known barcodes from /data/users/sjp/Projects/scLUPUS/scsnv_count/OM-HL-007-UC2/barcode_counts.txt.gz
#Could not open scsnv_index_genes.txt.gz for reading
Is this scsnv_index_genes.txt right? It has proper information(ref , tid, gid, gene_id, gene_name....).
Thanks, Paek.