Closed Deena-B closed 5 years ago
@mkeisenbach, You are the first person to submit a pull request to this repo! Kudos and thank you for the solution! @Deena-B we need a better file structure for the code and data. Right now @mkeisenbach code and data sit in the root directory, we would need to refactor that into proper places. After that we should be able to close the issue, isn't it?
moved to data folder
Problem
There are two txt files, named r1_list.txt and r2_list.txt here: https://github.com/deepcelllineage/mitolin/tree/master/generated/nguyen_nc_2018/ind2/
R stands for read. r1 is the first read (aka the sequence as read from the forward direction) and r2 is the second read (aka the sequence as read from the reverse direction).
Each file holds a list of filenames and the filenames should match, but a line count revealed that there is one extra filename in one of the lists.
We need to iterate through these lists, and generate new lists that have the filenames that are paired and don't have the filenames that do not have a matching pair.
Output
Please name the newly generated files "r1_list_pairs.txt" and "r1_list_pairs.txt" and upload them to the same directory: https://github.com/deepcelllineage/mitolin/tree/master/generated/nguyen_nc_2018/ind2/
Please also upload a jupyter notebook or markdown file (with FILENAME) that walks through the steps you took to clean the lists into the directory below: FILENAME: "DATE_pair_r1r2.md" (where DATE is replaced by a string like 20190521, and .md may be replaced by .ipynb) DIRECTORY: https://github.com/deepcelllineage/mitolin/lab_notebook/