Hacky hour 8th October - Githubissues

TomKellyGenetics commented 8 years ago

From @murraycadzow on September 23, 2015 22:48

15:30-16:30 Billiard Room, Staffclub The first hacky hour where we bring along and work on our own code in a relaxed environment with the opportunity to discuss and help each other

Copied from original issue: smilefreak/studyGroup#15

TomKellyGenetics commented 8 years ago

From @methylnick on September 24, 2015 22:52

I have an interesting problem that maybe you all have come across before, and I am just a newbie at all this stuff.

SeqAnswers forum thread fyi. http://seqanswers.com/forums/newreply.php?do=postreply&t=62881

The problem is reverse engineering a fastq file from a BAM (which looks to be paired end).

I also need to randomise the fastq file too (for every four lines of the fastq, randomise the order of these four lines in quite a big file), but! With the paired end element maintained.

I may have an answer by that time, we shall see.

N

TomKellyGenetics commented 8 years ago

From @smilefreak on September 27, 2015 5:12

Ahh you fired a memory :).

Basically, I had to do this because I wanted to remap some data I was given and was worried about biasing the aligner.

http://gatkforums.broadinstitute.org/discussion/2908/howto-revert-a-bam-file-to-fastq-format

This post has the steps I followed.

TomKellyGenetics commented 8 years ago

From @methylnick on September 28, 2015 1:58

Thanks, my problem is a little more complicated than that, :-)

TomKellyGenetics commented 8 years ago

From @smilefreak on September 28, 2015 2:4

It fits exactly with your problem description, maybe you can give more explanation

Turn BAM to fastq, but do so in a way that randomises the order of these fastq files while maintaining the paired-end relationships.

TomKellyGenetics commented 8 years ago

From @methylnick on September 28, 2015 2:7

my bad for not explaining properly (ps hope you enjoyed the talk). BAM to fastQ, I have another fastq file from the same individual which I want to mush together, while preserving the paired end relationship, but also to randomize.

TomKellyGenetics commented 8 years ago

From @smilefreak on September 28, 2015 2:10

Talk was awesome, I really liked the way he integrated the computer science aspects seamlessly into the discussion about primer design.

Ah so you have some bams that need to got into fastq? and another fastq files that you want to mush together to make one big randomised file, but making sure that pairs are randomised together?

TomKellyGenetics commented 8 years ago

From @methylnick on September 28, 2015 2:11

affirmative :-)

TomKellyGenetics commented 8 years ago

From @murraycadzow on September 28, 2015 2:14

Could you not convert your fastq into a bam then merge with your bam then go through the steps in James post?

TomKellyGenetics commented 8 years ago

From @murraycadzow on September 28, 2015 2:22

FYI picardtools is probably useful for fastq to sam/bam (http://broadinstitute.github.io/picard/command-line-overview.html#FastqToSam)

TomKellyGenetics commented 8 years ago

From @smilefreak on September 28, 2015 4:1

Anyways, I think the next port of call is an in-person discussion. Regardless of whether you solve this Nick. please bring it along to HackyHour!! Definately the kind of problem worth chatting about.

TomKellyGenetics commented 8 years ago

From @methylnick on September 28, 2015 19:53

yes will do James. I am sure there would be people interested in this problem.

TomKellyGenetics commented 8 years ago

From @methylnick on September 28, 2015 21:55

Thanks Murry for that suggestion, didn't think of it that way, there are some slight header formatting differences that also need to be resolved, I don't think I'll overcome the problem by hacky hour, happy to bring it for discussion with you all next hacky hour, it's already pencilled into my diary.

TomKellyGenetics commented 8 years ago

From @smilefreak on October 7, 2015 4:0

Just a reminder for HackyHour tomorrow, 3:30 in the Billiard room of the Staff Club. Let's make it a productive and fun session.

TomKellyGenetics commented 8 years ago

As mentioned elsewhere, I've got a friend with problems working on chromosomal locus data. They're trying to analyse co-location of CRISPR sites, sound like something we can help with?

TomKellyGenetics commented 8 years ago

From @murraycadzow on October 7, 2015 22:23

Can't make any promises without knowing the data but should at least have enough experience in the room to at least suggest a starting point

TomKellyGenetics commented 8 years ago

From @methylnick on October 7, 2015 22:30

ditto, I have pencilled in an hour but would think it would be fun to see the problem.

TomKellyGenetics commented 8 years ago

From @methylnick on October 8, 2015 19:47

Almost there:

awk '{ gsub("/1", "", $1) ; gsub("/2", "", $1) ; print }' N.sam

but:

awk '{ gsub("/1", "", $1) ; gsub("/2", "", $1) ; print }' N.sam | samtools view -b - > test.bam

returns:

[W::sam_read1] parse error at line 1 [main_samview] truncated file.

it seems to have removed the tabs and replaced them with whitespaces, no longer a SAM

More to come.

TomKellyGenetics / studyGroup

Hacky hour 8th October #5