Public-Health-Scotland / source-linkage-files

This repo is for the syntax used for the PHS Source Linkage File project
https://public-health-scotland.github.io/source-linkage-files/
Other
4 stars 2 forks source link

Update `00_sort_bi_extracts` to write anon_chi #952

Closed Jennit07 closed 4 months ago

Jennit07 commented 4 months ago

As part of ensuring anon-chi is used on disk, an update to sort bi extract script. Opening as a draft for now as this works for extracts with chi variable available but we do take some extracts without chi e.g. gp ooh data. Some thought is needed to expand the code to work around this. Possibly a conditional statement.

github-actions[bot] commented 4 months ago

@check-spelling-bot Report

:red_circle: Please review

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

Unrecognized words (1)

sourcedev

To accept these unrecognized words as correct, you could run the following commands ... in a clone of the [git@github.com:Public-Health-Scotland/source-linkage-files.git](https://github.com/Public-Health-Scotland/source-linkage-files.git) repository on the `amend_sort_bi_extracts` branch ([:information_source: how do I use this?]( https://github.com/check-spelling/check-spelling/wiki/Accepting-Suggestions)): ``` sh curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' | perl - 'https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9127422337/attempts/1' ```

OR

To have the bot accept them for you, reply quoting the following line: @check-spelling-bot apply updates.

Errors (1) See the [:open_file_folder: files](https://github.com/Public-Health-Scotland/source-linkage-files/pull/952/files/) view, the [:scroll:action log](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9127422337/job/25097654945#step:4:1), or [:memo: job summary](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9127422337/attempts/1#summary-25097654945) for details. [:x: Errors](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) | Count -|- [:x: forbidden-pattern](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions#forbidden-pattern) | 1 See [:x: Event descriptions](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) for more information.
If the flagged items are :exploding_head: false positives If items relate to a ... * binary file (or some other file you wouldn't want to check at all). Please add a file path to the `excludes.txt` file matching the containing file. File paths are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your files. `^` refers to the file's path from the root of the repository, so `^README\.md$` would exclude [README.md]( ../tree/HEAD/README.md) (on whichever branch you're using). * well-formed pattern. If you can write a [pattern](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-patterns) that would match it, try adding it to the `patterns.txt` file. Patterns are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your lines. Note that patterns can't match multiline strings.
github-actions[bot] commented 4 months ago

@check-spelling-bot Report

:red_circle: Please review

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

Unrecognized words (1)

sourcedev

To accept these unrecognized words as correct, you could run the following commands ... in a clone of the [git@github.com:Public-Health-Scotland/source-linkage-files.git](https://github.com/Public-Health-Scotland/source-linkage-files.git) repository on the `amend_sort_bi_extracts` branch ([:information_source: how do I use this?]( https://github.com/check-spelling/check-spelling/wiki/Accepting-Suggestions)): ``` sh curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' | perl - 'https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9155199063/attempts/1' ```

OR

To have the bot accept them for you, reply quoting the following line: @check-spelling-bot apply updates.

Errors (1) See the [:open_file_folder: files](https://github.com/Public-Health-Scotland/source-linkage-files/pull/952/files/) view, the [:scroll:action log](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9155199063/job/25167112100#step:4:1), or [:memo: job summary](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9155199063/attempts/1#summary-25167112100) for details. [:x: Errors](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) | Count -|- [:x: forbidden-pattern](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions#forbidden-pattern) | 1 See [:x: Event descriptions](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) for more information.
If the flagged items are :exploding_head: false positives If items relate to a ... * binary file (or some other file you wouldn't want to check at all). Please add a file path to the `excludes.txt` file matching the containing file. File paths are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your files. `^` refers to the file's path from the root of the repository, so `^README\.md$` would exclude [README.md]( ../tree/HEAD/README.md) (on whichever branch you're using). * well-formed pattern. If you can write a [pattern](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-patterns) that would match it, try adding it to the `patterns.txt` file. Patterns are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your lines. Note that patterns can't match multiline strings.
github-actions[bot] commented 4 months ago

@check-spelling-bot Report

:red_circle: Please review

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

Unrecognized words (1)

sourcedev

To accept these unrecognized words as correct, you could run the following commands ... in a clone of the [git@github.com:Public-Health-Scotland/source-linkage-files.git](https://github.com/Public-Health-Scotland/source-linkage-files.git) repository on the `amend_sort_bi_extracts` branch ([:information_source: how do I use this?]( https://github.com/check-spelling/check-spelling/wiki/Accepting-Suggestions)): ``` sh curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' | perl - 'https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9155228143/attempts/1' ```

OR

To have the bot accept them for you, reply quoting the following line: @check-spelling-bot apply updates.

Errors (1) See the [:open_file_folder: files](https://github.com/Public-Health-Scotland/source-linkage-files/pull/952/files/) view, the [:scroll:action log](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9155228143/job/25167188302#step:4:1), or [:memo: job summary](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9155228143/attempts/1#summary-25167188302) for details. [:x: Errors](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) | Count -|- [:x: forbidden-pattern](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions#forbidden-pattern) | 1 See [:x: Event descriptions](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) for more information.
If the flagged items are :exploding_head: false positives If items relate to a ... * binary file (or some other file you wouldn't want to check at all). Please add a file path to the `excludes.txt` file matching the containing file. File paths are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your files. `^` refers to the file's path from the root of the repository, so `^README\.md$` would exclude [README.md]( ../tree/HEAD/README.md) (on whichever branch you're using). * well-formed pattern. If you can write a [pattern](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-patterns) that would match it, try adding it to the `patterns.txt` file. Patterns are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your lines. Note that patterns can't match multiline strings.
github-actions[bot] commented 4 months ago

@check-spelling-bot Report

:red_circle: Please review

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

Unrecognized words (1)

sourcedev

To accept these unrecognized words as correct, you could run the following commands ... in a clone of the [git@github.com:Public-Health-Scotland/source-linkage-files.git](https://github.com/Public-Health-Scotland/source-linkage-files.git) repository on the `amend_sort_bi_extracts` branch ([:information_source: how do I use this?]( https://github.com/check-spelling/check-spelling/wiki/Accepting-Suggestions)): ``` sh curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' | perl - 'https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9157648112/attempts/1' ```

OR

To have the bot accept them for you, reply quoting the following line: @check-spelling-bot apply updates.

Errors (1) See the [:open_file_folder: files](https://github.com/Public-Health-Scotland/source-linkage-files/pull/952/files/) view, the [:scroll:action log](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9157648112/job/25174434318#step:4:1), or [:memo: job summary](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9157648112/attempts/1#summary-25174434318) for details. [:x: Errors](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) | Count -|- [:x: forbidden-pattern](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions#forbidden-pattern) | 1 See [:x: Event descriptions](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) for more information.
If the flagged items are :exploding_head: false positives If items relate to a ... * binary file (or some other file you wouldn't want to check at all). Please add a file path to the `excludes.txt` file matching the containing file. File paths are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your files. `^` refers to the file's path from the root of the repository, so `^README\.md$` would exclude [README.md]( ../tree/HEAD/README.md) (on whichever branch you're using). * well-formed pattern. If you can write a [pattern](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-patterns) that would match it, try adding it to the `patterns.txt` file. Patterns are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your lines. Note that patterns can't match multiline strings.
github-actions[bot] commented 4 months ago

@check-spelling-bot Report

:red_circle: Please review

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

Unrecognized words (1)

sourcedev

To accept these unrecognized words as correct, you could run the following commands ... in a clone of the [git@github.com:Public-Health-Scotland/source-linkage-files.git](https://github.com/Public-Health-Scotland/source-linkage-files.git) repository on the `amend_sort_bi_extracts` branch ([:information_source: how do I use this?]( https://github.com/check-spelling/check-spelling/wiki/Accepting-Suggestions)): ``` sh curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' | perl - 'https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9160100250/attempts/1' ```

OR

To have the bot accept them for you, reply quoting the following line: @check-spelling-bot apply updates.

Errors (1) See the [:open_file_folder: files](https://github.com/Public-Health-Scotland/source-linkage-files/pull/952/files/) view, the [:scroll:action log](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9160100250/job/25182003148#step:4:1), or [:memo: job summary](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9160100250/attempts/1#summary-25182003148) for details. [:x: Errors](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) | Count -|- [:x: forbidden-pattern](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions#forbidden-pattern) | 1 See [:x: Event descriptions](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) for more information.
If the flagged items are :exploding_head: false positives If items relate to a ... * binary file (or some other file you wouldn't want to check at all). Please add a file path to the `excludes.txt` file matching the containing file. File paths are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your files. `^` refers to the file's path from the root of the repository, so `^README\.md$` would exclude [README.md]( ../tree/HEAD/README.md) (on whichever branch you're using). * well-formed pattern. If you can write a [pattern](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-patterns) that would match it, try adding it to the `patterns.txt` file. Patterns are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your lines. Note that patterns can't match multiline strings.
lizihao-anu commented 4 months ago

Update the 00_Sort_BI_Extracts.R to

It works well as I tested. But I do have a question about the new file name. Since we have added a prefix "anon-" for those files, will it break the existing data pipeline? If so, we should

github-actions[bot] commented 4 months ago

@check-spelling-bot Report

:red_circle: Please review

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

Unrecognized words (1)

sourcedev

To accept these unrecognized words as correct, you could run the following commands ... in a clone of the [git@github.com:Public-Health-Scotland/source-linkage-files.git](https://github.com/Public-Health-Scotland/source-linkage-files.git) repository on the `amend_sort_bi_extracts` branch ([:information_source: how do I use this?]( https://github.com/check-spelling/check-spelling/wiki/Accepting-Suggestions)): ``` sh curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' | perl - 'https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9160417406/attempts/1' ```

OR

To have the bot accept them for you, reply quoting the following line: @check-spelling-bot apply updates.

Errors (1) See the [:open_file_folder: files](https://github.com/Public-Health-Scotland/source-linkage-files/pull/952/files/) view, the [:scroll:action log](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9160417406/job/25182992109#step:4:1), or [:memo: job summary](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9160417406/attempts/1#summary-25182992109) for details. [:x: Errors](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) | Count -|- [:x: forbidden-pattern](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions#forbidden-pattern) | 1 See [:x: Event descriptions](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) for more information.
If the flagged items are :exploding_head: false positives If items relate to a ... * binary file (or some other file you wouldn't want to check at all). Please add a file path to the `excludes.txt` file matching the containing file. File paths are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your files. `^` refers to the file's path from the root of the repository, so `^README\.md$` would exclude [README.md]( ../tree/HEAD/README.md) (on whichever branch you're using). * well-formed pattern. If you can write a [pattern](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-patterns) that would match it, try adding it to the `patterns.txt` file. Patterns are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your lines. Note that patterns can't match multiline strings.
github-actions[bot] commented 4 months ago

@check-spelling-bot Report

:red_circle: Please review

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

Unrecognized words (1)

sourcedev

To accept these unrecognized words as correct, you could run the following commands ... in a clone of the [git@github.com:Public-Health-Scotland/source-linkage-files.git](https://github.com/Public-Health-Scotland/source-linkage-files.git) repository on the `amend_sort_bi_extracts` branch ([:information_source: how do I use this?]( https://github.com/check-spelling/check-spelling/wiki/Accepting-Suggestions)): ``` sh curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' | perl - 'https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9172282001/attempts/1' ```

OR

To have the bot accept them for you, reply quoting the following line: @check-spelling-bot apply updates.

Errors (1) See the [:open_file_folder: files](https://github.com/Public-Health-Scotland/source-linkage-files/pull/952/files/) view, the [:scroll:action log](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9172282001/job/25218429092#step:4:1), or [:memo: job summary](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9172282001/attempts/1#summary-25218429092) for details. [:x: Errors](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) | Count -|- [:x: forbidden-pattern](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions#forbidden-pattern) | 1 See [:x: Event descriptions](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) for more information.
If the flagged items are :exploding_head: false positives If items relate to a ... * binary file (or some other file you wouldn't want to check at all). Please add a file path to the `excludes.txt` file matching the containing file. File paths are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your files. `^` refers to the file's path from the root of the repository, so `^README\.md$` would exclude [README.md]( ../tree/HEAD/README.md) (on whichever branch you're using). * well-formed pattern. If you can write a [pattern](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-patterns) that would match it, try adding it to the `patterns.txt` file. Patterns are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your lines. Note that patterns can't match multiline strings.
Jennit07 commented 4 months ago

Update the 00_Sort_BI_Extracts.R to

  • enable parallel computing
  • tell whether a file have a chi column or not

It works well as I tested. But I do have a question about the new file name. Since we have added a prefix "anon-" for those files, will it break the existing data pipeline? If so, we should

  • modify the functions which pick up those files. OR,
  • we just do not add the prefix to file names.

Hi @lizihao-anu Thanks for all your help with this. We want to keep the anon- prefix for consistency as the updated get_boxi_extract path has renamed this for all extracts when we are changing everything to anon_chi. I have made the changes we have discussed and also tested a file. I am happy the function is working as intended.

I will mark this PR ready for review. Could you please do a final check to confirm you are happy with this and then approve/merge into the June24 update branch. Thanks! :)

github-actions[bot] commented 4 months ago

@check-spelling-bot Report

:red_circle: Please review

See the :open_file_folder: files view, the :scroll:action log, or :memo: job summary for details.

Unrecognized words (1)

sourcedev

To accept these unrecognized words as correct, you could run the following commands ... in a clone of the [git@github.com:Public-Health-Scotland/source-linkage-files.git](https://github.com/Public-Health-Scotland/source-linkage-files.git) repository on the `amend_sort_bi_extracts` branch ([:information_source: how do I use this?]( https://github.com/check-spelling/check-spelling/wiki/Accepting-Suggestions)): ``` sh curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' | perl - 'https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9172398523/attempts/1' ```

OR

To have the bot accept them for you, reply quoting the following line: @check-spelling-bot apply updates.

Errors (1) See the [:open_file_folder: files](https://github.com/Public-Health-Scotland/source-linkage-files/pull/952/files/) view, the [:scroll:action log](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9172398523/job/25218807601#step:4:1), or [:memo: job summary](https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/9172398523/attempts/1#summary-25218807601) for details. [:x: Errors](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) | Count -|- [:x: forbidden-pattern](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions#forbidden-pattern) | 1 See [:x: Event descriptions](https://github.com/check-spelling/check-spelling/wiki/Event-descriptions) for more information.
If the flagged items are :exploding_head: false positives If items relate to a ... * binary file (or some other file you wouldn't want to check at all). Please add a file path to the `excludes.txt` file matching the containing file. File paths are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your files. `^` refers to the file's path from the root of the repository, so `^README\.md$` would exclude [README.md]( ../tree/HEAD/README.md) (on whichever branch you're using). * well-formed pattern. If you can write a [pattern](https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples:-patterns) that would match it, try adding it to the `patterns.txt` file. Patterns are Perl 5 Regular Expressions - you can [test]( https://www.regexplanet.com/advanced/perl/) yours before committing to verify it will match your lines. Note that patterns can't match multiline strings.