File revision - Githubissues

roldud commented 1 year ago

To do a good traceability, one needs to show also the revision of the document that was used as a base. To keep things simple, I can see multiple ways to uniquely identify the source document. But, basically, once you get the document name, and the path to the repository, then you're in pretty good shape. Would you accept a patch that adds a new option for a document to call a script that fetches the document identifier?

What I'm thinking is something like:

document FOOBAR 
    -path relative/path/my.doc
    -getid relative/path/to/getid_script.sh

And then getid_script.sh should return a one-liner like any of:

my.doc git ab54cd78ef git://my.repo/hoge.git
https://my.server.internal/trunk/path/to/my.doc r12345
version 1.2.3 r4567 @server or any other corresponding arbitrary few-line script.

The result would be like

goeb commented 1 year ago

We can discuss of that.

My first concern is cybersecurity. Could a malicious person forge a configuration file or document that would have a final user execute malicious code unknowingly?

Example 1: relative/path/to/getid_script.sh points to a malicious code. Example 2: If relative/path/to/getid_script.sh parsed a document for some content (maybe not your use case, but we can imagine this use case) and the document was maliciously modified to have relative/path/to/getid_script.sh return javascript, then this javascript could get executed when the final user opens the reqflow HTML report.

If think of this because the log4shell vulnerability exploited a feature of code execution that was not very used and most people were unaware of.

I wish reqflow had not the responsibility of executing another program.

I think of another way for achieving a similar result to what you describe: have reqflow insert a user-provided text in a text block inside the report. The workflow would be:

an external program prepares a table of references (the info that you need), in text, html, whatever...
a new feature of reqflow embeds this data in the report

roldud commented 1 year ago

The cybersecurity concern is relevant indeed. Going around this topic, I have been wondering for some time if .req configuration files are better kept static, or should be auto-generated. If we consider that it is usual for the conf .req files to be auto-generated, then it would be sufficient to search-and-replace, and the .req config file would just need to specify a new option like -source "string". This is the template approach. If we consider that .req file should be static, and the source and references can be adapted, then the external file giving the references can make sense. In this case however, even the -path option could become optional, as such a secondary reference file/table would likely use the document by name.

In both case, the result however will be that the options will naturally group in two categories:

parsing options: -req, -ref, -start-after
document identification options: -path, -source, -link (which would place a certain link instead of the direct path in the report, to allow a distributed document source to still produce a clickable report)

Going either way is a product manager decision, so what's your take? Probably the template approach requires more boilerplate to integrate in CI and such, while the second approach requires more configuration files. Which may also be generated, actually.

goeb commented 1 year ago

The external file idea (that by the way would be similar to the htmlcss directive) would be more general than just covering your use case. It would allow the insertion of any HTML snippet in the HTML report, in a place global to all documents (not in the paragraphs of the documents). I do not think that reqflow should parse it, and thus it could not be used to provide the -path argument.

If you need to insert an identification line for each document within the related paragraph, as in your initial example, then we could add a new option like -source "string".

With the help of an environment variable it would look like this:

document TEST -path TEST.txt \
    -source "$TEST_SRC" \
    -req T_[-a-zA-Z0-9_]* \
    -ref "Ref:  *(.*)"

Or we could make it more generic with a -meta option, in order to cover more use cases:

document TEST -path TEST.txt \
    -meta "Source" "$TEST_SRC" \
    -req T_[-a-zA-Z0-9_]* \
    -ref "Ref:  *(.*)"

I like the -meta way (by the way would you have a better name than -meta?):

it can be customized
it looks easy to implement
one can imagine to enable later several arguments -meta for a single document (though not necessary for the moment)

roldud commented 1 year ago

In a way, you're pointing in the direction that reqflow should be capable of using more than one configuration file. Doing so, one can be the main conf with the document list, and another would hold several macros that could be gathered by a script. Multi-config file seems simple enough to handle indeed.

I didn't have an issue with -meta, but if you really want an alternative, I can propose -extra, -info, -prop, with -extra being my favorite.

To summarize, this discussion goes into two small changes:

add multi-file input for configuration of reqflow,
add -extra to allow inserting one or more strings below the document pointer. Right?

goeb commented 1 year ago

Could you elaborate on the multi-file configuration and give an example to illustrate?

roldud commented 1 year ago

Think of it like this:

generated.req:

define SPEC_SRC "https://github.com/goeb/reqflow/blob/285332e8167e27a209fbd0fdba72e7349d68f2ea/test/SPEC.docx"

conf.req:

define REQ_PATTERN PRINTF_[-a-zA-Z0-9_]*
define ALPHANUM [-a-zA-Z0-9_]
document SPEC
    -path SPEC.docx
    -req REQ_PATTERN
    -stop-after Annexe -ref "Ref: +(.*)"
    -end-req-style toto -end-req tutu
    -sort alphanum
    -extra "Type: Software Specification Document" 
    -extra $SPEC_SRC

Then invoke reqflow trac -x html generated.req conf.req -o report.html to pass the two files.

And the intention is that, as a preparation step, generated.req is created by some magic shell script that will fetch the revision and path of the file. There are a few scripts out there to do this depending on the VCS used.

goeb commented 1 year ago

If I correctly understand your proposal, analyzing the files generated.req and conf.req would produce the same result as the current implementation analyzing a concatenation of both files. I would agree with that. However I feel that using environment variables (with a dollar sign) would meet your need as well, and you should consider it first.

Example using an environment variable:

export SPEC_SRC="https://github.com/goeb/reqflow/blob/285332e8167e27a209fbd0fdba72e7349d68f2ea/test/SPEC.docx"
reqflow trac ...

I agree with the keyword -extra.

I have a few remarks:

Your proposal would not be compatible with the usage 2. reqflow <config> (used for running reqflow by double-clicking on a config file). We can still decide that it is a limitation: this use case does not accept multi-file config.
We have to keep the -c flag for compatibility and consistency. This is not a problem, as we can have reqflow accept several -c arguments. Eg: reqflow trac -x html -c generated.req -c conf.req -o report.html.
Reqflow has the ability to read the config from STDIN (option -c -). This is compatible with multi-file config.
In you example, I think you got confused about SPEC_SRC:
- $SPEC_SRC refers to an environment variable
- SPEC_SRC (without the dollar sign) may refer to a define
With, -extra $SPEC_SRC, how do you handle your need for the word "Source" in the final report?

roldud commented 1 year ago

However I feel that using environment variables (with a dollar sign) would meet your need as well, and you should consider it first.

Well, my current dataset has ~1200 files to parse (including one file per API, that's why it's so big). I'll expire the environment size pretty quickly with the environment variable approach.

In you example, I think you got confused about SPEC_SRC:

$SPEC_SRC refers to an environment variable

SPEC_SRC (without the dollar sign) may refer to a define

I know this is documented, but somehow written next to each other it's so much clearer! Indeed, I got confused between the two.

With, -extra $SPEC_SRC, how do you handle your need for the word "Source" in the final report?

I'd guess -extra "Source: $SPEC_SRC", but indeed this may require environment variable parsing in strings, which I'm not sure is done today. Or, directly tell the user to prepare the line externally, and force them to include the text "Source: " directly inside the environment variable/macro. The latter is not totally unreasonable since we intend for the data to be script-generated.

goeb commented 1 year ago

For the record, I just tried 10,000 environment variables or 10,000 command line arguments. Both work in bash. I suppose it uses a few hundreds of ko in RAM.

Anyway, I agree for multi-file config.

As for the -extra "Source: $SPEC_SRC", okay. It should be simple to make it work.

The parsing is done as follows:

First reqflow splits the configuration in tokens. Eg: ['document', 'TEST', '-path', 'TEST.txt', '-extra', 'Source: $SPEC_SRC']
Then, on data tokens only (and not on flag tokens), it calls consolidateToken() which replaces defines (replaceDefinedVariable()) and environment variables (replaceEnv()).

If you send such a patch, I will take it with pleasure. I will review it and I can help on the test and documentation parts.

goeb / reqflow

File revision #23