Here is the command I used to accomplish the first part of the task:
sed -E 's/\"([0-9]+),([0-9]+)/\1\2/' tableofSNPs.csv | sed -E 's/([0-9]+),([0-9]+)\"/\1\2/' | sed -E 's/([0-9]+)\"/\1/' > cleantableofSNPs.csv
This first eliminates the first set of " and a following comma between two digits. Next, it eliminates the next comma that comes before the second set of ", but only if there are 2 commas in the number. This means that any digits that had only one comma are not recognized by this command, so the code then looks at any remaining digits followed by a " and removes that ". I just pushed these results into a new file that contains the clean digits with no " or commas.
For the second task, I used the following code:
sed 's/[^,]//g' cleantableofSNPs.csv | awk '{ print length }'
sed removes everything except the commas and then awk counts the number of characters that remain in each line afterwards. If you check using head and tail, you can safely assume that there are 3 commas across all of the values.
Finally, to switch all of the A's and T's, I used the following code:
sed -E 's/T/Y/' cleantableofSNPs.csv | sed -E 's/A/T/' | sed -E 's/Y/A/' > finalcleantableofSNPs.csv
This first changes all of the T's to Y's so that we can maintain which spaces are supposed to be A's by the end. Then, it changes all of the A's to T's. Then we replace all of the Y space-holders with A's. All of this was directed into a new final clean table file.
We could check this by counting the number of T's and A's before the changes are made and then compare this to the number of T's and A's after the switch is done. Unless there is exactly the same number of T's and A's in the file, this should work to check that they are all switched.
@cecileane and @coraallencoleman
Hello!
Here is the command I used to accomplish the first part of the task:
sed -E 's/\"([0-9]+),([0-9]+)/\1\2/' tableofSNPs.csv | sed -E 's/([0-9]+),([0-9]+)\"/\1\2/' | sed -E 's/([0-9]+)\"/\1/' > cleantableofSNPs.csv
This first eliminates the first set of " and a following comma between two digits. Next, it eliminates the next comma that comes before the second set of ", but only if there are 2 commas in the number. This means that any digits that had only one comma are not recognized by this command, so the code then looks at any remaining digits followed by a " and removes that ". I just pushed these results into a new file that contains the clean digits with no " or commas.
For the second task, I used the following code:
sed 's/[^,]//g' cleantableofSNPs.csv | awk '{ print length }'
sed
removes everything except the commas and thenawk
counts the number of characters that remain in each line afterwards. If you check usinghead
andtail
, you can safely assume that there are 3 commas across all of the values.Finally, to switch all of the A's and T's, I used the following code:
sed -E 's/T/Y/' cleantableofSNPs.csv | sed -E 's/A/T/' | sed -E 's/Y/A/' > finalcleantableofSNPs.csv
This first changes all of the T's to Y's so that we can maintain which spaces are supposed to be A's by the end. Then, it changes all of the A's to T's. Then we replace all of the Y space-holders with A's. All of this was directed into a new final clean table file.
We could check this by counting the number of T's and A's before the changes are made and then compare this to the number of T's and A's after the switch is done. Unless there is exactly the same number of T's and A's in the file, this should work to check that they are all switched.
Best, Shannon