ChangLab / FAST-iCLIP

Fully Automated and Standardized (FAST) iCLIP analysis pipeline.
GNU General Public License v2.0
21 stars 15 forks source link

expected output for example? #24

Open andypohl opened 7 years ago

andypohl commented 7 years ago

I've installed the latest FAST-iCLIP and I'm still having a lot of problems. I'm just trying to run the example command on the example data:

$ fasticlip -i rawdata/example_MMhur_R1.fastq rawdata/example_MMhur_R2.fastq --GRCm38 -s docs/GRCm38/GRCm38_STAR/ -n MMhur -o results

I get:

Result :
Error : 100228 reads; of these: 100228 (100.00%) were unpaired; of these: 98946 (98.72%) aligned 0 times 381 (0.38%) aligned exactly 1 time 901 (0.90%) aligned >1 times 1.28% overall alignment rate

which looks very poor.  
- a more sinister pandas error just after this:

Process mapped data Traceback (most recent call last): File "fasticlip/retroviralMapping.py", line 150, in bedR2=readBed(mappedBed[1]) File "fasticlip/retroviralMapping.py", line 143, in readBed bedFile = pd.read_table(path,dtype=str,header=None) File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 315, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 645, in init self._make_engine(self.engine) File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1213, in init self._reader = _parser.TextReader(src, **kwds) File "pandas/parser.pyx", line 523, in pandas.parser.TextReader.cinit (pandas/parser.c:5214) pandas.io.common.EmptyDataError: No columns to parse from file


Perhaps the pandas error is a consequence of the low alignment rate.  What is the expected output?  I get some output, but no figures are generated because of a matplotlib/Qt error (which I'll try to fix on my end before mentioning it here).  
bdo311 commented 7 years ago

Thanks for the comment and sorry you've been running into all of these issues. I've been contributing less to the newer versions of fasticlip but I'll try to answer these as best as I can.

We use bowtie for mapping reads to exogenous retroviruses and tRNA, and STAR for mapping to endogenous retroviruses and the genome. So, these two outputs correspond to viral and tRNA mapping and we should expect a low rate.

The pandas error comes from trying to make plots from the retroviral data. It looks like it's complaining because, as you said, there might be too few reads mapping causing pandas to try to unsuccessfully read in an empty file. This is likely just an artifact of us providing small test files -- larger files will have enough retroviral reads to make data frames with.

andypohl commented 7 years ago

Ok. That's why I wanted to know what the expected output is supposed to look like. I won't know it's working unless I can reproduce something. The download of all the genome indexes, etc was nearly 50 GB. I'm happy to download another 5, 10, 20 GB if it's a better, more realistic example. It's nice to have quick-running toy examples, but I'm more concerned about getting it right than getting it quick. Anyway I'm pleased it might be nearly working.

frank42195 commented 7 years ago

I get the exact same output trying to run the example file. Have you found a solution? I am completely at a loss on how to get this to work.

bdo311 commented 7 years ago

@frank42195 This output is happening unfortunately because we updated the script to search for retroviral reads but our example is too small to include any, and so pandas is complaining of an empty data frame. We hope to push out an update over the next few days to address this. Sorry for the inconvenience!

andypohl commented 7 years ago

Whatever the new example involves, I'll still stress the importance of not just providing the example command, but providing some sort of summary of the output. I know the program creates a ton of output (many files). But as far as that's concerned, I think a priority should be placed on the output that goes to the screen while the program is running. Just having that would bring me a lot of piece of mind that I've installed everything correctly. Thanks for your efforts.

frank42195 commented 7 years ago

Hi,

Since FAST-iCLIP doesn’t work with the example data, I am trying it on some of our own. It is still crashing, this time in fasticlip/retroviralMapping.py, line 149. It appears to be looking for files in the output directory that end with mappedToendoVirus_withDupes.bed but such files don’t exist. Looking at the runLog file, there are no reads in any of the bed files listed when it comes to partitioning reads by type. I have attached the runLog to this email and copied the terminal output below. Is this again a problem with the data? I have been trying off and on for weeks trying to get FAST-iCLIP to run, so far without success.

~ Frank

fasticlip --verbose --trimmed -i test/TH81-1_SG_iCLIP_S1_R1_trimmed.fastq test/TH81-1_SG_iCLIP_S1_R2_trimmed.fastq --GRCh38 -a CTACACGTTCAGAGTTCTACAGTCCGACGATC -s /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38_STAR -n frank -o results

Jun 27 16:26:33 ..... Started STAR run

Jun 27 16:26:33 ..... Loading genome

Jun 27 16:26:34 ..... Started mapping

Jun 27 16:49:30 ..... Finished successfully

Jun 27 16:55:03 ..... Started STAR run

Jun 27 16:55:03 ..... Loading genome

Jun 27 16:55:43 ..... Started mapping

Jun 27 16:57:52 ..... Finished successfully

Jun 27 17:33:44 ..... Started STAR run

Jun 27 17:33:44 ..... Loading genome

Jun 27 17:33:46 ..... Started mapping

Jun 27 17:58:40 ..... Finished successfully

Jun 27 18:04:47 ..... Started STAR run

Jun 27 18:04:47 ..... Loading genome

Jun 27 18:31:35 ..... Started mapping

Jun 27 18:45:38 ..... Finished successfully

[samopen] SAM header is present: 1 sequences.

[samopen] SAM header is present: 1 sequences.

[samopen] SAM header is present: 1 sequences.

[samopen] SAM header is present: 1 sequences.

[samopen] SAM header is present: 1 sequences.

[samopen] SAM header is present: 1 sequences.

[samopen] SAM header is present: 1 sequences.

[bam_sort_core] merging from 3 files...

[samopen] SAM header is present: 1 sequences.

[bam_sort_core] merging from 3 files...

[samopen] SAM header is present: 1 sequences.

[samopen] SAM header is present: 1 sequences.

[samopen] SAM header is present: 625 sequences.

[samopen] SAM header is present: 625 sequences.

[samopen] SAM header is present: 25 sequences.

[bam_sort_core] merging from 5 files...

[samopen] SAM header is present: 25 sequences.

[bam_sort_core] merging from 3 files...

[samopen] SAM header is present: 625 sequences.

[samopen] SAM header is present: 625 sequences.

[samopen] SAM header is present: 625 sequences.

[samopen] SAM header is present: 625 sequences.

Performing Bowtie...

Process mapped data

Traceback (most recent call last):

File "fasticlip/retroviralMapping.py", line 149, in

bedR1=readBed(mappedBed[0])

IndexError: list index out of range

Traceback (most recent call last):

File "/usr/local/bin/fasticlip", line 11, in

load_entry_point('fasticlip==0.9.3', 'console_scripts', 'fasticlip')()

File "build/bdist.linux-x86_64/egg/fasticlip/fasticlip.py", line 439, in main

File "build/bdist.linux-x86_64/egg/fasticlip/helper.py", line 818, in plot_figure1

File "build/bdist.linux-x86_64/egg/fasticlip/helper.py", line 734, in plot_ReadAccounting

File "build/bdist.linux-x86_64/egg/fasticlip/helper.py", line 709, in lineCount

ValueError: invalid literal for int() with base 10: 'cat: results/frank/TH81-1_SG_iCLIP_S1_R1_trimmed_trimmed.fastq: No such file or directory\n0'

Timestamp: 2017-06-27 15:51:55.792093

Parameters used

3' barcode: CTACACGTTCAGAGTTCTACAGTCCGACGATC 'Minimum quality score (q): 25 Percentage of bases with > q: 80 5' bases to trim: 18 'Threshold for minimum number of RT stops (repeat): 2 samples with >= 3 RT stops Threshold for minimum number of RT stops (nonrepeat): 2 samples with >= 3 RT stops

Processing THis sample frank

Run mapping to indexes. Mapping test/TH81-1_SG_iCLIP_S1_R1_trimmed.fastq to exoViruses bowtie2 -p 8 -x /usr/local/FAST-iCLIP/docs/viral/DV test/TH81-1_SG_iCLIP_S1_R1_trimmed.fastq --un test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToViral_new.fastq -S test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToDV.sam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToDV.sam_stats.txt 2>&1 bowtie2 -p 8 -x /usr/local/FAST-iCLIP/docs/viral/ZV test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToViral.fastq --un test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToViral_new.fastq -S test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToZV.sam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToZV.sam_stats.txt 2>&1 bowtie2 -p 8 -x /usr/local/FAST-iCLIP/docs/viral/HCV_JFH1 test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToViral.fastq --un test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToViral_new.fastq -S test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToHCV_JFH1.sam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToHCV_JFH1.sam_stats.txt 2>&1 Mapping test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToViral.fastq to repeat bowtie2 -p 8 -x /usr/local/FAST-iCLIP/docs/GRCh38/repeat/rep_spaced test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToViral.fastq --un test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToRepeat.fastq -S test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToRepeat.sam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToRepeat.sam_stats.txt 2>&1 Mapping test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToViral.fastq to endovirus STAR --genomeDir /usr/local/FAST-iCLIP/docs/GRCh38/retroviral/ --runThreadN 16 --genomeLoad NoSharedMemory --readFilesIn test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToRepeat.fastq --outFileNamePrefix test/TH81-1_SG_iCLIP_S1_R1_trimmed_endoVirus --alignEndsType EndToEnd --outFilterMismatchNoverLmax 0.08 --outReadsUnmapped Fastx Mapping test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToViral.fastq to tRNA bowtie2 -p 8 -x /usr/local/FAST-iCLIP/docs/GRCh38/trna/tRNA_hg19 test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToendoVirus.fastq --un test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToTrna.fastq -S test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToTrna.sam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToTrna.sam_stats.txt 2>&1 Mapping test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToViral.fastq to genome STAR --genomeDir /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38_STAR --runThreadN 16 --genomeLoad NoSharedMemory --readFilesIn test/TH81-1_SG_iCLIP_S1_R1_trimmed_notMappedToTrna.fastq --outFileNamePrefix test/TH81-1_SG_iCLIP_S1_R1_trimmed --alignEndsType EndToEnd --outFilterMismatchNoverLmax 0.08 Mapping test/TH81-1_SG_iCLIP_S1_R2_trimmed.fastq to exoViruses bowtie2 -p 8 -x /usr/local/FAST-iCLIP/docs/viral/DV test/TH81-1_SG_iCLIP_S1_R2_trimmed.fastq --un test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToViral_new.fastq -S test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToDV.sam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToDV.sam_stats.txt 2>&1 bowtie2 -p 8 -x /usr/local/FAST-iCLIP/docs/viral/ZV test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToViral.fastq --un test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToViral_new.fastq -S test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToZV.sam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToZV.sam_stats.txt 2>&1 bowtie2 -p 8 -x /usr/local/FAST-iCLIP/docs/viral/HCV_JFH1 test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToViral.fastq --un test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToViral_new.fastq -S test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToHCV_JFH1.sam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToHCV_JFH1.sam_stats.txt 2>&1 Mapping test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToViral.fastq to repeat bowtie2 -p 8 -x /usr/local/FAST-iCLIP/docs/GRCh38/repeat/rep_spaced test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToViral.fastq --un test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToRepeat.fastq -S test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToRepeat.sam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToRepeat.sam_stats.txt 2>&1 Mapping test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToViral.fastq to endovirus STAR --genomeDir /usr/local/FAST-iCLIP/docs/GRCh38/retroviral/ --runThreadN 16 --genomeLoad NoSharedMemory --readFilesIn test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToRepeat.fastq --outFileNamePrefix test/TH81-1_SG_iCLIP_S1_R2_trimmed_endoVirus --alignEndsType EndToEnd --outFilterMismatchNoverLmax 0.08 --outReadsUnmapped Fastx Mapping test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToViral.fastq to tRNA bowtie2 -p 8 -x /usr/local/FAST-iCLIP/docs/GRCh38/trna/tRNA_hg19 test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToendoVirus.fastq --un test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToTrna.fastq -S test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToTrna.sam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToTrna.sam_stats.txt 2>&1 Mapping test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToViral.fastq to genome STAR --genomeDir /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38_STAR --runThreadN 16 --genomeLoad NoSharedMemory --readFilesIn test/TH81-1_SG_iCLIP_S1_R2_trimmed_notMappedToTrna.fastq --outFileNamePrefix test/TH81-1_SG_iCLIP_S1_R2_trimmed --alignEndsType EndToEnd --outFilterMismatchNoverLmax 0.08

Run samtools. cat test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToDV.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToDV_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToDV_sorted.bam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToDV.bed cat test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToZV.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToZV_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToZV_sorted.bam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToZV.bed cat test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToHCV_JFH1.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToHCV_JFH1_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToHCV_JFH1_sorted.bam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToHCV_JFH1.bed cat test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToDV.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToDV_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToDV_sorted.bam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToDV.bed cat test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToZV.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToZV_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToZV_sorted.bam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToZV.bed cat test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToHCV_JFH1.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToHCV_JFH1_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToHCV_JFH1_sorted.bam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToHCV_JFH1.bed cat test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToRepeat.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToRepeat_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToRepeat_sorted.bam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToRepeat.bed cat test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToRepeat.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToRepeat_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToRepeat_sorted.bam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToRepeat.bed cat test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToendoVirus.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToendoVirus_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToendoVirus_sorted.bam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToendoVirus.bed cat test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToendoVirus.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToendoVirus_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToendoVirus_sorted.bam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToendoVirus.bed cat test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToTrna.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToTrna_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToTrna_sorted.bam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToTrna.bed cat test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToTrna.sam | samtools view -q 42 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToTrna_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToTrna_sorted.bam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToTrna.bed cat test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToGenome.sam | samtools view -q 255 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToGenome_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToGenome_sorted.bam > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToGenome.bed cat test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToGenome.sam | samtools view -q 255 -Su -F 0x4 - -o - | samtools sort - test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToGenome_sorted bamToBed -i test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToGenome_sorted.bam > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToGenome.bed

Viral RT stop isolation. virus: DV bed: test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToDV.bed ok bed: test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToZV.bed bed: test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToHCV_JFH1.bed bed: test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToDV.bed ok bed: test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToZV.bed bed: test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToHCV_JFH1.bed virus: ZV bed: test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToDV.bed bed: test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToZV.bed ok bed: test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToHCV_JFH1.bed bed: test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToDV.bed bed: test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToZV.bed ok bed: test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToHCV_JFH1.bed virus: HCV_JFH1 bed: test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToDV.bed bed: test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToZV.bed bed: test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToHCV_JFH1.bed ok bed: test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToDV.bed bed: test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToZV.bed bed: test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToHCV_JFH1.bed ok check: DV check: ZV check: HCV_JFH1

Repeat RT stop isolation. Merge Repeat RT stops. Nonrepeat RT stop isolation. bedtools intersect -a test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToGenome.bed -b /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38_repeatMasker.bed -wa -v -sorted -s > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToGenome_noRepeat.bed bedtools intersect -a test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToGenome.bed -b /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38_repeatMasker.bed -wa -wb -sorted -s > test/TH81-1_SG_iCLIP_S1_R1_trimmed_mappedToGenome_repeat.bed bedtools intersect -a test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToGenome.bed -b /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38_repeatMasker.bed -wa -v -sorted -s > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToGenome_noRepeat.bed bedtools intersect -a test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToGenome.bed -b /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38_repeatMasker.bed -wa -wb -sorted -s > test/TH81-1_SG_iCLIP_S1_R2_trimmed_mappedToGenome_repeat.bed Merge Nonrepeat RT stops.

Getting list of snoRNAs

Filtering out snoRNAs and miRNAs bedtools intersect -a results/frank/frank_threshold=3_GRCh38_allreads.mergedRT.bed -b /usr/local/FAST-iCLIP/docs/GRCh38/snoRNA_coordinates.bed -wa -v -s | sort -k1,1 -k2,2n | bedtools intersect -a - -b /usr/local/FAST-iCLIP/docs/GRCh38/miR_sortclean.bed -wa -v -s -sorted | awk -F '\t' 'BEGIN {OFS="\t"} {print $1,$2,$3,$4 "" NR,$5,$6}' > results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_snoRNAremoved_miRNAremoved.bed

Annotating reads by gene Make bedGraphs bedtools genomecov -bg -split -i results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bed -g /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes > results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bedgraph /usr/local/FAST-iCLIP/bin/bedGraphToBigWig results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bedgraph /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bw bedtools genomecov -bg -split -i results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bed -g /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes -strand + > results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_plus.bedgraph /usr/local/FAST-iCLIP/bin/bedGraphToBigWig results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_plus.bedgraph /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_plus.bw bedtools genomecov -bg -split -i results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bed -g /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes -strand - > results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_minus.bedgraph /usr/local/FAST-iCLIP/bin/bedGraphToBigWig results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_minus.bedgraph /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_minus.bw bedtools genomecov -bg -split -i results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bed -g /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes > results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bedgraph /usr/local/FAST-iCLIP/bin/bedGraphToBigWig results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bedgraph /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bw bedtools genomecov -bg -split -i results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bed -g /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes -strand + > results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_plus.bedgraph /usr/local/FAST-iCLIP/bin/bedGraphToBigWig results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_plus.bedgraph /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_plus.bw bedtools genomecov -bg -split -i results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted.bed -g /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes -strand - > results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_minus.bedgraph /usr/local/FAST-iCLIP/bin/bedGraphToBigWig results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_minus.bedgraph /usr/local/FAST-iCLIP/docs/GRCh38/GRCh38.sizes results/frank/frank_threshold=3_GRCh38_allreads.mergedRT_cleaned_sorted_minus.bw

Partition reads by type. countRemainingGeneTypes No reads in results/frank/clipGenes_bidirectional_promoter_lncRNA_reads.bed No reads in results/frank/clipGenes_non_coding_reads.bed No reads in results/frank/clipGenes_3prime_overlapping_ncRNA_reads.bed No reads in results/frank/clipGenes_rRNA_reads.bed No reads in results/frank/clipGenes_vaultRNA_reads.bed No reads in results/frank/clipGenes_Mt_tRNA_reads.bed No reads in results/frank/clipGenes_snRNA_reads.bed No reads in results/frank/clipGenes_sRNA_reads.bed No reads in results/frank/clipGenes_sense_overlapping_reads.bed No reads in results/frank/clipGenes_ribozyme_reads.bed No reads in results/frank/clipGenes_processed_transcript_reads.bed No reads in results/frank/clipGenes_scaRNA_reads.bed No reads in results/frank/clipGenes_antisense_reads.bed No reads in results/frank/clipGenes_scRNA_reads.bed No reads in results/frank/clipGenes_miRNA_reads.bed No reads in results/frank/clipGenes_sense_intronic_reads.bed No reads in results/frank/clipGenes_Mt_rRNA_reads.bed No reads in results/frank/clipGenes_macro_lncRNA_reads.bed No reads in results/frank/clipGenes_misc_RNA_reads.bed

Intron and UTR analysis.

Run tRNA isotype counting.

Record repeat RNA. Running Retroviral Mapping

Make plots. Making Figure 1 test/TH81-1_SG_iCLIP_S1_R1_trimmed.fastq test/TH81-1_SG_iCLIP_S1_R2_trimmed.fastq results/frank/TH81-1_SG_iCLIP_S1_R1_trimmed_trimmed.fastq