Closed yeroslaviz closed 1 year ago
Hi @yeroslaviz!
Can you share the output of uname -a
and the results of the directory created by SoloTE using ls -Rlht SoloTE/output/
?
The input BED and the input BAM files seem to be ok based on the lines you sent. Depending on the files that have generated in the temporary directory, we can further assess what's going on!
Best, Braulio.
Hi, and yhanks for the fats response.
here are the details:
$ uname -a
Linux supercube016 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
and the second one (I'm running soloTE
against two different bam files, but putting the output of only one here, as the same error occurs in both)
$ ls -Rlht SoloTE/output/
SoloTE/output/:
total 2.0K
drwxr-xr-x 2 yeroslaviz b_cox 4.0K May 2 16:27 starsolo_SoloTE_temp
drwxr-xr-x 2 yeroslaviz b_cox 4.0K May 2 16:21 cellranger_SoloTE_temp
SoloTE/output/cellranger_SoloTE_temp:
total 115G
-rw-r--r-- 1 yeroslaviz b_cox 14M May 2 16:23 cellranger_barcodes.tsv
-rw-r--r-- 1 yeroslaviz b_cox 1.7M May 2 16:21 cellranger_genes.tsv
-rw-r--r-- 1 yeroslaviz b_cox 1.7G May 2 16:21 cellranger_allcounts_final.txt
-rw-r--r-- 1 yeroslaviz b_cox 20M May 2 16:21 cellranger_subftes_2.txt
-rw-r--r-- 1 yeroslaviz b_cox 0 May 2 16:21 cellranger_locustes_2.txt
-rw-r--r-- 1 yeroslaviz b_cox 1.7G May 2 16:21 cellranger_genes_2.txt
-rw-r--r-- 1 yeroslaviz b_cox 21M May 2 16:19 cellranger_subftes.txt
-rw-r--r-- 1 yeroslaviz b_cox 0 May 2 16:19 cellranger_locustes.txt
-rw-r--r-- 1 yeroslaviz b_cox 1.7G May 2 16:19 cellranger_genes.txt
-rw-r--r-- 1 yeroslaviz b_cox 1.7G May 2 16:19 cellranger_allcounts.txt
-rw-r--r-- 1 yeroslaviz b_cox 304M May 2 16:19 cellranger_countpercell_2L.counts
-rw-r--r-- 1 yeroslaviz b_cox 344M May 2 16:18 cellranger_countpercell_2R.counts
-rw-r--r-- 1 yeroslaviz b_cox 2.7G May 2 16:17 cellranger_countpercell_2L.counts.tmp
-rw-r--r-- 1 yeroslaviz b_cox 363M May 2 16:17 cellranger_countpercell_3R.counts
-rw-r--r-- 1 yeroslaviz b_cox 320M May 2 16:17 cellranger_countpercell_X.counts
-rw-r--r-- 1 yeroslaviz b_cox 2.6G May 2 16:16 cellranger_countpercell_2R.counts.tmp
-rw-r--r-- 1 yeroslaviz b_cox 303M May 2 16:16 cellranger_countpercell_3L.counts
-rw-r--r-- 1 yeroslaviz b_cox 2.3G May 2 16:15 cellranger_countpercell_3R.counts.tmp
-rw-r--r-- 1 yeroslaviz b_cox 2.4G May 2 16:15 cellranger_countpercell_X.counts.tmp
-rw-r--r-- 1 yeroslaviz b_cox 2.2G May 2 16:14 cellranger_countpercell_3L.counts.tmp
-rw-r--r-- 1 yeroslaviz b_cox 66M May 2 16:10 cellranger_countpercell_mitochondrion_genome.counts
-rw-r--r-- 1 yeroslaviz b_cox 1.4G May 2 16:09 cellranger_countpercell_mitochondrion_genome.counts.tmp
-rw-r--r-- 1 yeroslaviz b_cox 18M May 2 16:03 cellranger_countpercell_4.counts
-rw-r--r-- 1 yeroslaviz b_cox 128M May 2 16:03 cellranger_countpercell_4.counts.tmp
-rw-r--r-- 1 yeroslaviz b_cox 2.5M May 2 16:02 cellranger_countpercell_Y.counts
-rw-r--r-- 1 yeroslaviz b_cox 4.3M May 2 16:02 cellranger_countpercell_Y.counts.tmp
-rw-r--r-- 1 yeroslaviz b_cox 639K May 2 16:02 cellranger_final.bam.bai
-rw-r--r-- 1 yeroslaviz b_cox 18G May 2 15:59 cellranger_final.bam
-rw-r--r-- 1 yeroslaviz b_cox 18G May 2 15:32 cellranger_full_sorted.bam
-rw-r--r-- 1 yeroslaviz b_cox 18G May 2 15:22 cellranger_full.bam
-rw-r--r-- 1 yeroslaviz b_cox 29M May 2 15:21 cellranger_teannotated.bam
-rw-r--r-- 1 yeroslaviz b_cox 1.2M May 2 15:20 cellranger_selectedtes.bed
-rw-r--r-- 1 yeroslaviz b_cox 40M May 2 15:20 cellranger_nogenes_overlappingtes.bed
-rw-r--r-- 1 yeroslaviz b_cox 106K May 2 15:20 cellranger_nogenes_overlappingtes.bam.bai
-rw-r--r-- 1 yeroslaviz b_cox 37M May 2 15:20 cellranger_nogenes_overlappingtes.bam
-rw-r--r-- 1 yeroslaviz b_cox 2.4G May 2 15:20 cellranger_nogenes.bam
-rw-r--r-- 1 yeroslaviz b_cox 18G May 2 15:20 cellranger_genes.bam
-rw-r--r-- 1 yeroslaviz b_cox 858 May 2 15:20 cellranger_nogenes_newtag.bam
-rw-r--r-- 1 yeroslaviz b_cox 2.4G May 2 14:51 cellranger_nogenes_oldtag.bam
-rw-r--r-- 1 yeroslaviz b_cox 18G May 2 14:51 temp.bam
thanks
Hi @yeroslaviz,
It looks like there is a small bug almost at the end of the pipeline, as most intermediate files have been generated without issues.
Can you run the attached script inside the cellranger_SoloTE_temp directory
with this command?
python generatecounts_debug_20230105.py
After running it, please paste here the output.
Hello, I have the same issue and this is the output for me after running python generatecounts_debug_20230105.py
154693 124910 1694304
I managed to get it to work with SoloTE_1.06/SoloTE_pipeline.py
instead of SoloTE_1.08
Hi @fw262 , thanks for the feedback!!
Please put the attached developmental version in the same folder as you have the SoloTE_pipeline.py
script, and run it with the same command (i.e., python SoloTE_developmental_20230503.py (...)
). It should continue from the last steps and not cause the error.
Glad that it worked @fw262 !
python generatecounts_debug_20230105.py
I have run the generatecounts_debug_20230105.py
script and got the following output:
32936 736828 64204077
But running the developmental script resulted in the same error as before:
*****
***** ERROR: Requested column 3, but database file - only has fields 1 - 0.
Creating final results directory
/fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/output/chr2L_SoloTE_output was created
Traceback (most recent call last):
File "/fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/SoloTE_1.08/SoloTE_developmental_20230503.py", line 342, in <module>
file_table = pd.read_table(input_te_file,header=None,sep="\t")
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1289, in read_table
return _read(filepath_or_buffer, kwds)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 605, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1442, in __init__
self._engine = self._make_engine(f, self.engine)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1753, in _make_engine
return mapping[engine](f, **self.options)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 79, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 554, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
(I have re-run this test on a smaller subset containing only chromosome 2L)
It seems that running the developmental script does create though an output folder with the three needed files.
I still get the error message. I can't say for sure, but this might has something to do with the fact that the features.tsv
file in the output folder contains now only two columns instead of three.
Can this be the problem? I copy-paste the complete output from STDOUT here below, as you can follow it better.
For now I don't know if the results are good enough or if I can use the output files in the downstream analysis. Is it just a formatting problem, or if there are other problems in the analysis.
thanks for the help
Assa
$ python SoloTE/SoloTE_1.08/SoloTE_developmental_20230503.py -t 20 -d SoloTE/output/ -o cellranger -a SoloTE/dm6.soloTE_ChromNames_corrected.bed -b Ctrl_FC/outs/possorted_genome_bam.bam
SoloTE started at 08:31:19
samtools found!
bedtools found!
BAM file generated using CellRanger
SoloTE v1.08 started!
SoloTE Home directory /fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/SoloTE_1.08
SoloTE executed from /fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons
Results will be stored in /fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/output
Input BAM file: /fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/Ctrl_FC/outs/possorted_genome_bam.bam
Input TE BED file: /fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/dm6.soloTE_ChromNames_corrected.bed
Currently working in temporary directory: /fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/output/cellranger_SoloTE_temp
samtools view --threads 20 -d GN -U cellranger_nogenes_oldtag.bam -O BAM -o temp.bam /fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/Ctrl_FC/outs/possorted_genome_bam.bam
samtools view -d GN:- -o cellranger_nogenes_newtag.bam -U cellranger_genes.bam -O BAM temp.bam
samtools cat --threads 20 -o cellranger_nogenes.bam cellranger_nogenes_oldtag.bam cellranger_nogenes_newtag.bam
samtools view --threads 20 -O BAM -o cellranger_nogenes_overlappingtes.bam -L /fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/dm6.soloTE_ChromNames_corrected.bed cellranger_nogenes.bam
samtools index cellranger_nogenes_overlappingtes.bam
bedtools bamtobed -i cellranger_nogenes_overlappingtes.bam -split > cellranger_nogenes_overlappingtes.bed
bedtools intersect -a /fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/dm6.soloTE_ChromNames_corrected.bed -b cellranger_nogenes_overlappingtes.bed -u > cellranger_selectedtes.bed
python /fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/SoloTE_1.08/annotateBAM.py cellranger_nogenes_overlappingtes.bam cellranger_selectedtes.bed cellranger_teannotated.bam
samtools cat --threads 20 -o cellranger_full.bam cellranger_genes.bam cellranger_teannotated.bam
samtools sort --threads 20 -O BAM -o cellranger_full_sorted.bam cellranger_full.bam
[bam_sort_core] merging from 12 files and 20 in-memory blocks...
samtools view -U cellranger_final.bam --tag CB:- -@ 8 cellranger_full_sorted.bam > /dev/null
samtools index cellranger_final.bam
Process: 808376 2L
Process: 808377 2R
Process: 808379 3L
Process: 808378 3R
Process: 808380 4
Process: 808381 mitochondrion_genome
Process: 808382 X
Process: 808383 Y
samtools view --keep-tag GN,CB,UB cellranger_final.bam 3R -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-"'|awk 'BEGIN{FS=OFS="\t"}{for (tag_index=NF-2;tag_index<=NF;tag_index++){ tag_index_s=gensub(/([A-Z]{2}):Z:(.*)/,"\\1\t\\2","g",$tag_index); split(tag_index_s,t,"\t");tag_dict[t[1]]=t[2]}; print tag_dict["GN"],tag_dict["CB"],tag_dict["UB"]}' > cellranger_countpercell_3R.counts.tmp
samtools view --keep-tag GN,CB,UB cellranger_final.bam 2L -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-"'|awk 'BEGIN{FS=OFS="\t"}{for (tag_index=NF-2;tag_index<=NF;tag_index++){ tag_index_s=gensub(/([A-Z]{2}):Z:(.*)/,"\\1\t\\2","g",$tag_index); split(tag_index_s,t,"\t");tag_dict[t[1]]=t[2]}; print tag_dict["GN"],tag_dict["CB"],tag_dict["UB"]}' > cellranger_countpercell_2L.counts.tmp
samtools view --keep-tag GN,CB,UB cellranger_final.bam 2R -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-"'|awk 'BEGIN{FS=OFS="\t"}{for (tag_index=NF-2;tag_index<=NF;tag_index++){ tag_index_s=gensub(/([A-Z]{2}):Z:(.*)/,"\\1\t\\2","g",$tag_index); split(tag_index_s,t,"\t");tag_dict[t[1]]=t[2]}; print tag_dict["GN"],tag_dict["CB"],tag_dict["UB"]}' > cellranger_countpercell_2R.counts.tmp
samtools view --keep-tag GN,CB,UB cellranger_final.bam 3L -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-"'|awk 'BEGIN{FS=OFS="\t"}{for (tag_index=NF-2;tag_index<=NF;tag_index++){ tag_index_s=gensub(/([A-Z]{2}):Z:(.*)/,"\\1\t\\2","g",$tag_index); split(tag_index_s,t,"\t");tag_dict[t[1]]=t[2]}; print tag_dict["GN"],tag_dict["CB"],tag_dict["UB"]}' > cellranger_countpercell_3L.counts.tmp
samtools view --keep-tag GN,CB,UB cellranger_final.bam 4 -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-"'|awk 'BEGIN{FS=OFS="\t"}{for (tag_index=NF-2;tag_index<=NF;tag_index++){ tag_index_s=gensub(/([A-Z]{2}):Z:(.*)/,"\\1\t\\2","g",$tag_index); split(tag_index_s,t,"\t");tag_dict[t[1]]=t[2]}; print tag_dict["GN"],tag_dict["CB"],tag_dict["UB"]}' > cellranger_countpercell_4.counts.tmp
samtools view --keep-tag GN,CB,UB cellranger_final.bam Y -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-"'|awk 'BEGIN{FS=OFS="\t"}{for (tag_index=NF-2;tag_index<=NF;tag_index++){ tag_index_s=gensub(/([A-Z]{2}):Z:(.*)/,"\\1\t\\2","g",$tag_index); split(tag_index_s,t,"\t");tag_dict[t[1]]=t[2]}; print tag_dict["GN"],tag_dict["CB"],tag_dict["UB"]}' > cellranger_countpercell_Y.counts.tmp
samtools view --keep-tag GN,CB,UB cellranger_final.bam mitochondrion_genome -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-"'|awk 'BEGIN{FS=OFS="\t"}{for (tag_index=NF-2;tag_index<=NF;tag_index++){ tag_index_s=gensub(/([A-Z]{2}):Z:(.*)/,"\\1\t\\2","g",$tag_index); split(tag_index_s,t,"\t");tag_dict[t[1]]=t[2]}; print tag_dict["GN"],tag_dict["CB"],tag_dict["UB"]}' > cellranger_countpercell_mitochondrion_genome.counts.tmp
samtools view --keep-tag GN,CB,UB cellranger_final.bam X -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-"'|awk 'BEGIN{FS=OFS="\t"}{for (tag_index=NF-2;tag_index<=NF;tag_index++){ tag_index_s=gensub(/([A-Z]{2}):Z:(.*)/,"\\1\t\\2","g",$tag_index); split(tag_index_s,t,"\t");tag_dict[t[1]]=t[2]}; print tag_dict["GN"],tag_dict["CB"],tag_dict["UB"]}' > cellranger_countpercell_X.counts.tmp
CompletedProcess(args=['wc', '-l', 'cellranger_countpercell_Y.counts'], returncode=0, stdout='46126 cellranger_countpercell_Y.counts\n')
CompletedProcess(args=['wc', '-l', 'cellranger_countpercell_4.counts'], returncode=0, stdout='654118 cellranger_countpercell_4.counts\n')
CompletedProcess(args=['wc', '-l', 'cellranger_countpercell_mitochondrion_genome.counts'], returncode=0, stdout='2290533 cellranger_countpercell_mitochondrion_genome.counts\n')
CompletedProcess(args=['wc', '-l', 'cellranger_countpercell_3L.counts'], returncode=0, stdout='11443591 cellranger_countpercell_3L.counts\n')
CompletedProcess(args=['wc', '-l', 'cellranger_countpercell_X.counts'], returncode=0, stdout='11696933 cellranger_countpercell_X.counts\n')
CompletedProcess(args=['wc', '-l', 'cellranger_countpercell_3R.counts'], returncode=0, stdout='13650943 cellranger_countpercell_3R.counts\n')
CompletedProcess(args=['wc', '-l', 'cellranger_countpercell_2R.counts'], returncode=0, stdout='12907214 cellranger_countpercell_2R.counts\n')
CompletedProcess(args=['wc', '-l', 'cellranger_countpercell_2L.counts'], returncode=0, stdout='11521534 cellranger_countpercell_2L.counts\n')
['cellranger_countpercell_Y.counts', 'cellranger_countpercell_4.counts', 'cellranger_countpercell_mitochondrion_genome.counts', 'cellranger_countpercell_3L.counts', 'cellranger_countpercell_X.counts', 'cellranger_countpercell_3R.counts', 'cellranger_countpercell_2R.counts', 'cellranger_countpercell_2L.counts']
*****
***** ERROR: Requested column 3, but database file - only has fields 1 - 0.
Creating final results directory
/fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/output/cellranger_SoloTE_output was created
Traceback (most recent call last):
File "/fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Plitzko/SvenK_Retrotransposons/SoloTE/SoloTE_1.08/SoloTE_developmental_20230503.py", line 342, in <module>
file_table = pd.read_table(input_te_file,header=None,sep="\t")
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1289, in read_table
return _read(filepath_or_buffer, kwds)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 605, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1442, in __init__
self._engine = self._make_engine(f, self.engine)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1753, in _make_engine
return mapping[engine](f, **self.options)
File "/fs/home/yeroslaviz/miniconda3/envs/soloTE/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 79, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 554, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
@yeroslaviz can you confirm the number of lines in the following files with wc -l
?
cellranger_subftes_2.txt
cellranger_locustes_2.txt
cellranger_genes_2.txt
From the previous answer, it may seem that one of these files are empty.
Additionally, please add here the output of samtools view -c -q 255 cellranger_teannotated.bam
Hi, yes, you're right the cellranger_locustes_2.txt
file is empty.
What should I expect to see in this file?
also the other results are:
$ wc -l SoloTE/output/cellranger_SoloTE_temp/cellranger_subftes_2.txt
348828 SoloTE/output/cellranger_SoloTE_temp/cellranger_subftes_2.txt
$ wc -l SoloTE/output/cellranger_SoloTE_temp/cellranger_locustes_2.txt
0 SoloTE/output/cellranger_SoloTE_temp/cellranger_locustes_2.txt
$ wc -l SoloTE/output/cellranger_SoloTE_temp/cellranger_genes_2.txt
63855249 SoloTE/output/cellranger_SoloTE_temp/cellranger_genes_2.txt
$ samtools view -c -q 255 SoloTE/output/cellranger_SoloTE_temp/cellranger_teannotated.bam
268090
Does it has anything to do with the fact that I'm working on Drosophila?
Basically, it is indicating that no locus-specific TEs are being found. This is odd, because the reads with MAPQ = 255 should be annotated as locus-specific, and there are 268090 in the TE BAM file. Would it be possible for you to send me the cellranger_teannotated.bam
to inspect it?
Hi, yes of course.
you can find the fiule under this link - https://datashare.biochem.mpg.de/s/a8QtGebXmcBRJ0d
it is approx. 30Mb in size, so not that big.
thanks for the help.
Happy to help, and thanks for sharing the BAM file!
There is a bug for locus-specific TEs. The "chr" word is searched when summarizing these TEs. We will work on updating the pipeline so this doesn't cause errors. In the meantime, please try updating your TE BED file using this command:
awk 'BEGIN{FS=OFS="\t"}{$4="chr"$4; print $0}' dm6.soloTE_ChromNames_corrected.bed > dm6_SoloTE_ChrIdsFixed.bed
It should result in something like this:
2L 9726 9861 chr2L|9726|9861|IDEFIX_LTR:Gypsy:LTR|+ 39 +
2L 9889 9993 chr2L|9889|9993|DNAREP1_DM:Helitron:RC|- 71 -
2L 13255 13311 chr2L|13255|13311|DNAREP1_DM:Helitron:RC|- 16 -
2L 15866 15955 chr2L|15866|15955|DNAREP1_DM:Helitron:RC|- 44 -
2L 24235 24484 chr2L|24235|24484|DNAREP1_DM:Helitron:RC|+ 183 +
2L 27528 27835 chr2L|27528|27835|DNAREP1_DM:Helitron:RC|- 70 -
2L 47515 52544 chr2L|47515|52544|LINEJ1_DM:Jockey:LINE|+ 6385 +
2L 60251 60631 chr2L|60251|60631|DNAREP1_DM:Helitron:RC|+ 295 +
2L 60652 60786 chr2L|60652|60786|DNAREP1_DM:Helitron:RC|+ 75 +
2L 64280 64914 chr2L|64280|64914|BS2:Jockey:LINE|+ 679 +
Then, re-run the pipeline in a new directory.
Let me know how this goes!
Hi,
thanks a lot for all the help. The run was now finished successfully. and the files seems to be complete.
Do you maybe have also another workflow with more details on how to continue now downstream with the Seurat
or singleCellExperiment
packages? something similar to how you showed in your paper?
thanks for everything
Assa
That's great!
As for a downstream analysis, we have a brief protocol described here. Overall, if you are using Single Cell data that has been already annotated, with recommend to use that as metadata to check the clustering results, and do a marker analysis.
thanks for that.
Just a comment about the features file. When trying to convert it into a seurat object it throws some warnings.
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have pipe characters ('|'), replacing with dashes ('-')
I saw that you use a lot of the pipe symbol in the soloTE IDs. maybe they can be replaced before hand?
@yeroslaviz those warnings shouldn't interfere in any of the Seurat analysis.
In theory, generating a BED file with "-" instead of "|" should not interfere with the proper execution of the pipeline, but we wanted to avoid using "-" as some TEs contain that character in their name ("HERVK13-int").
I am having a similar problem as a few other users hee, but can't solve it.
I'm working with the following tools within a conda environment
I'm getting a similar error:
I have made sure, that chromosome names in both files are identical. For the command and input files you can have a look at the attached
Inputfiles.txt
fileany ideas, How I can solve the problem.
thanks
Assa
P.S.
Not sure, if this is relevant. The bed file I have created after downloading the file from UCSC. It also contains all the "Un_" and the "_random" entries, which I don't have in my bam file. The bam files were created using either
cellranger
orSTARsolo
with a genome downloaded from Ensembl with only the main chromosomesDo I somehow need to filter my bed file to contain only the entries with these chromosomes?
Inputfiles.txt