Open TomKellyGenetics opened 8 years ago
From @methylnick on September 24, 2015 22:52
I have an interesting problem that maybe you all have come across before, and I am just a newbie at all this stuff.
SeqAnswers forum thread fyi. http://seqanswers.com/forums/newreply.php?do=postreply&t=62881
The problem is reverse engineering a fastq file from a BAM (which looks to be paired end).
I also need to randomise the fastq file too (for every four lines of the fastq, randomise the order of these four lines in quite a big file), but! With the paired end element maintained.
I may have an answer by that time, we shall see.
N
From @smilefreak on September 27, 2015 5:12
Ahh you fired a memory :).
Basically, I had to do this because I wanted to remap some data I was given and was worried about biasing the aligner.
http://gatkforums.broadinstitute.org/discussion/2908/howto-revert-a-bam-file-to-fastq-format
This post has the steps I followed.
From @methylnick on September 28, 2015 1:58
Thanks, my problem is a little more complicated than that, :-)
From @smilefreak on September 28, 2015 2:4
It fits exactly with your problem description, maybe you can give more explanation
Turn BAM to fastq, but do so in a way that randomises the order of these fastq files while maintaining the paired-end relationships.
From @methylnick on September 28, 2015 2:7
my bad for not explaining properly (ps hope you enjoyed the talk). BAM to fastQ, I have another fastq file from the same individual which I want to mush together, while preserving the paired end relationship, but also to randomize.
From @smilefreak on September 28, 2015 2:10
Talk was awesome, I really liked the way he integrated the computer science aspects seamlessly into the discussion about primer design.
Ah so you have some bams that need to got into fastq? and another fastq files that you want to mush together to make one big randomised file, but making sure that pairs are randomised together?
From @methylnick on September 28, 2015 2:11
affirmative :-)
From @murraycadzow on September 28, 2015 2:14
Could you not convert your fastq into a bam then merge with your bam then go through the steps in James post?
From @murraycadzow on September 28, 2015 2:22
FYI picardtools is probably useful for fastq to sam/bam (http://broadinstitute.github.io/picard/command-line-overview.html#FastqToSam)
From @smilefreak on September 28, 2015 4:1
Anyways, I think the next port of call is an in-person discussion. Regardless of whether you solve this Nick. please bring it along to HackyHour!! Definately the kind of problem worth chatting about.
From @methylnick on September 28, 2015 19:53
yes will do James. I am sure there would be people interested in this problem.
From @methylnick on September 28, 2015 21:55
Thanks Murry for that suggestion, didn't think of it that way, there are some slight header formatting differences that also need to be resolved, I don't think I'll overcome the problem by hacky hour, happy to bring it for discussion with you all next hacky hour, it's already pencilled into my diary.
From @smilefreak on October 7, 2015 4:0
Just a reminder for HackyHour tomorrow, 3:30 in the Billiard room of the Staff Club. Let's make it a productive and fun session.
As mentioned elsewhere, I've got a friend with problems working on chromosomal locus data. They're trying to analyse co-location of CRISPR sites, sound like something we can help with?
From @murraycadzow on October 7, 2015 22:23
Can't make any promises without knowing the data but should at least have enough experience in the room to at least suggest a starting point
From @methylnick on October 7, 2015 22:30
ditto, I have pencilled in an hour but would think it would be fun to see the problem.
From @methylnick on October 8, 2015 19:47
Almost there:
awk '{ gsub("/1", "", $1) ; gsub("/2", "", $1) ; print }' N.sam
but:
awk '{ gsub("/1", "", $1) ; gsub("/2", "", $1) ; print }' N.sam | samtools view -b - > test.bam
returns:
[W::sam_read1] parse error at line 1 [main_samview] truncated file.
it seems to have removed the tabs and replaced them with whitespaces, no longer a SAM
More to come.
From @murraycadzow on September 23, 2015 22:48
15:30-16:30 Billiard Room, Staffclub The first hacky hour where we bring along and work on our own code in a relaxed environment with the opportunity to discuss and help each other
Copied from original issue: smilefreak/studyGroup#15