Open lucyrose93 opened 7 years ago
We can use the fs node core module, which has a method called readFile. This takes as it's first parameter the filename and a callback function. In the callback function, we can write filecontent.indexOf('string'). If that's greater than -1, this means the string is found inside the text file.
require("fs").readFile("filename.ext", function(err, filecontent) {
if (err)
throw err;
console.log("String"+(filecontent.indexOf("search string")>-1 ? " " : " not ")+"found");
});
From a guy on SO trying to search through a huge (2 mill +) dataset:
I have split the records into different text files (at most 200 records per file) and put the files in different directories (I used the content of one data field to determine the directory tree). I end up with about 50000 files in about 40000 directories. I have then run Lucene to index the files. Searching for a string with the Lucene demo program is pretty fast. Splitting and indexing took a few minutes: this is totally acceptable for me because it is a static data set that I want to query.
So for the large .tct file maybe we will have to split the data into smaller files (a-z possibly) and then narrow down from there.
http://lunrjs.com/ <-- a search framework we could inspect
Wow -- that's thinking big! .json is, alternatively, easy to manipulate with JSON.parse() and JSON.stringify()
Apparently using a stream can handle larger files. It's on the fs module:
var fs = require('fs');
var stream = fs.createReadStream(path);
stream.on('data',function(d){
if(!found) found=!!(''+d).match(content)
});
stream.on('error',function(err){
then(err, found);
});
stream.on('close',function(err){
then(err, found);
});
Still trying to understand the function/callback
Aha! Was just about to post on stream
Based on this spike, I've created an issue for security to prevent code injections:
filter(pattern, keep)
Filter the files in the stream. pattern can be:
String: A glob pattern that files must match.
Function: This function get the actual path to the file and must return a boolean.
NOTE: relative patterns are resolved against the same base cwd as the one used to set up the stream.
The optional keep parameter indicate if files matching the pattern must be kept in the stream and the others to be excluded (true), or the other way around (false) (default: true)
var fs = require('fs-stream');
fs('/files/*.*')
.pipe(fs.filter('/files/*.md'));
Should we use .json or .txt? What's the most efficient & effective way to 🔍 through the file?
Comment your findings below...