Closed srbcheema1 closed 6 years ago
@cyenyxe @jmmut
Added a function to check if a file is compressed or not
This function prints a warning message if the file is compressed.
Please review
I have made the requested the changes to validator.cpp
file and also extracted strings to string_constants.hpp
file.
Tests are yet to be made.
Have a look at the changes. Once the changes get approved I will squash the commits into one.
I have made the requested changes. Please have a look.
@jmmut I want to discuss something regarding #107 issue .
there are tools in boost iostream library to read bz2 and gzip compression formats .
what about returning compression type from is_compressed_file
and this would help us in reading the compressed files .
I wouldn't worry about #107. For now, just make simple code that does one thing, and when we start that issue, we'll deal with any needed refactoring.
Really. One of the biggest lessons I learnt in recent years is how important is to make simple code. One problem at a time. Usually trying to be too clever makes things worse.
Let's finish this issue first, I'd say only the tests are left. Let me know if you have any doubt on how to design the tests.
I can figure out that i need to add testcase to test/vcf/parser_v*_.test.cpp
file.
Am I going in right direction ?
There are 2 types of tests we can do here.
the integration ones are almost done. Just rename the compressed files you added from test/input_files/v4.3/compressed_files/failed_body_alt_000.vcf.*
to test/input_files/v4.3/failed/compressed.vcf.*
. that way, the test/vcf/parser_v43_test.cpp will try to do a whole validation on those files, and expect them to fail. You don't need to modify test/vcf/parser_v43_test.cpp.
the unit tests should use the functions and check that they return true
and false
appropiately. For that, you'll have to move the declarations to validator.hpp (I didn't realise they were in the cpp until now). You can add a simple test file, taking as example ploidy_test.cpp, for instance. Testing the extension function is easy. To test the magic number, you might need to write into a stringstream those bytes, but should be easy as well.
I have made a file tests/vcf/compressed_files_test.cpp
. In that i have added tests to test is_compressed_magic_num()
and is_compressed_extension
I have renamed those compressed files .
I haven't yet moved those files into failed folder. By keeping them in seperate folder it was easy for me to write tests for files that return true
on is_compressed_*()
I have added the tests for non compressed files also. I have also added a special test to check working of seekg()
.
@jmmut tell me if I should squash the commits. I will squash similar commits into one once you approve the changes.
@jmmut there was a minor issue with seekg . It was behaving abnormally for files with number of characters less than 9 characters. I fixed it as explained in https://stackoverflow.com/questions/48979934/seekg-not-behaving-as-expected using input.clear() please have a look at it. I am looking forward to get this PR merged.
hm, that is exactly the issue I was afraid of. what's more, it's not fixed. if you try this:
./builds/debug/bin/vcf_validator -i test/input_files/v4.3/passed/complexfile_passed_000.vcf
it will say it's valid, but if you pass it through a pipe instead, it will say it's not valid:
cat test/input_files/v4.3/passed/complexfile_passed_000.vcf | ./builds/debug/bin/vcf_validator
I suggest leaving the approach of seekg and using a copy of the first line as done in https://github.com/EBIvariation/vcf-validator/blob/develop/src/vcf/validator.cpp#L153. there you can see that the first line is read, and it's used for detecting the version in line 145, and to start the validation in line 153.
tests were passing. I will check that out.
I too was thinking to use first line https://github.com/EBIvariation/vcf-validator/pull/111#discussion_r170569585
earlier when you mentioned about that in https://github.com/EBIvariation/vcf-validator/pull/111#discussion_r166876466 I searched such function in validator.hpp
(maybe you made typo while writingvalidator.cpp
) but was unable to find one. by that time i got approach of seekg as i have used it in filestreams but i was doubtful for using it in std::cin .
yes, that's what I meant, sorry for not being clear enough. I'll try to be more specific in the future, and please ask any doubt you have.
I have tested it using these commands
./build/bin/vcf_validator < test/input_files/v4.3/passed/complexfile_passed_000.vcf
and this one
./build/bin/vcf_validator -i test/input_files/v4.3/passed/complexfile_passed_000.vcf
I use command < filename
more than cat filename| command
maybe thats why these things went unnoticed . moreover our testcases are not capable to check this bug out .
Didn't know cat file |
and < file
could behave so differently. I used cat file |
because if the file is compressed the way to use it is simpler and more similar, as zcat file.gz |
.
Agree that the tests (as they are right now) can not check this. This is another reason to go away from stream handling.
hmm strange it is. Till now i thought both the ways similar to giving standard input
@jmmut I have made the requested changes. Please have a look at it.
This PR adds the feature to display warning on providing compressed VCF file as STDIN or using -i parameter. In reference to the issue #104