TechnikEmpire / HttpFilteringEngine

Transparent filtering TLS proxy.
Mozilla Public License 2.0
60 stars 33 forks source link

Difficult to compile the project #73

Closed sunflover closed 8 years ago

sunflover commented 8 years ago

The project is difficult to compile, it depends on a lot of other libs,and it is a vs2015 project. I hope you can provide compiled bin and lib or make it easy to compile. Hope to improve about compliling.

TechnikEmpire commented 8 years ago

Yeah, I never quite got around to finishing the build instructions. I'll straighten all this out probably this weekend, and will upload a release that contains precompiled binaries.

sunflover commented 8 years ago

Thank you, look forward to.

kirillv commented 8 years ago

About deps. What is gq library in deps? It would be great if there would be description in readme.md - what is this libray, where to download etc. Thanks in advance!

TechnikEmpire commented 8 years ago

@kirillv Indeed, the dep build instructions are not complete. I worked on this project for 2 years of my life full time, published it here and moved on. Just taking a bit of down time, then I plan on splitting this into two proper projects. One project being just the proxy library, the other project being the filtering library built on top. I also plan on adding build dep instructions etc, just have not had a chance. GQ is a library I (re)wrote, it's under my profile.

kirillv commented 8 years ago

I` ve managed to build native dll. It was tricky. Some paths are broken. Openssl build script is broken (paths again), in your gq library 3rdparty (boost) is broken, etc Not tested builded dll, but if some help is needed (to write build process in readme) - let me know. Thanks in advance! Waiting for new releases!

TechnikEmpire commented 8 years ago

@kirillv Glad you got it working, I've got some other projects in the works right now (priority stuff) but I do plan on coming back to this soon. Thanks for the offer to help.

dkwiebe commented 8 years ago

@kirillv Would you be willing to share any of the changes you made with me? I'd like to try out Stahpit-WPF and need this library to be able to build that project.

kirillv commented 8 years ago

Frankly speaking, I dont remember) I can give you full project that can be built (with all deps). I have no time to make out of box build system and pull requests. Sorry if it was the point.

TechnikEmpire commented 8 years ago

@dkwiebe There's incomplete instructions on the Wiki page. This project is huge and ridiculous, I have not figured out any automated build system suitable for it because it requires downloading third party tools like Mozilla Build (for building NSS libraries and such), many boost libs, other open source libs I've written and more. It's a pain, and I have not had time to invest in publishing a release, updating the wiki or attempting to implement a better build system.

TechnikEmpire commented 8 years ago

@dkwiebe Also after checking out your business site, just wanted to say FYI that this project or Stahp It are insufficient for blocking adult content. Blocking IP addresses and domains/URI routes is not sufficient, and this is all that is provided here. It's necessary to use natural language processing to categorize payload metadata and content, as well as parse and analyze certain common serialization types to effectively block pornography. However, I stripped this functionality from public versions of my software.

dkwiebe commented 8 years ago

@kirillv Thanks for the response. I appreciate the offer but that's not necessary.

@TechnikEmpire Thanks for the update. That's pretty much what I wanted to find out by building it. Do you have a version of your software with this functionality that is available to license?

TechnikEmpire commented 8 years ago

@dkwiebe unfortunately I don't because I'm pulled in 10 different directions and can't choose one. I suspended development on my porn blocking utilities. I've been working recently when I can to make a packet interceptor via a fake local VPN on Android, which would enable me to plug in my porn blocking stuff but that's a hefty time investment I can't really make right now. I can say you need a lot of data to build accurate NLP models. I think compressed I have several gigs of text I used to build models. Anyway sorry I couldn't be of more help.

TechnikEmpire commented 8 years ago

@dkwiebe On a side note, since I believe very strongly in the work you're doing, I'll offer a gem that's a very simple solution to a rather complex and frustrating problem. Get a copy of domains for your blacklists here:

http://dsi.ut-capitole.fr/blacklists/

Push the porn domains to a hashset or dictionary.

Assuming you have control over HTTP responses, any time you get a JSON response, do the following:

  1. split the JSON using a single space as a delimiter
  2. for each resulting substring, trim out anything that's not a valid domain character. For simplicity, you can use a regex like [^a-zA-Z0-9\-\.]
  3. for each cleaned substring, see if it exists in your porn url hashtable. If so, return HTTP 204 No-Content instead of the request.

This will, very effectively, block all pornography from coming through image/video search engines like google, yahoo, bing so on and so forth. Hope that comes in handy.

dkwiebe commented 8 years ago

@TechnikEmpire Thanks! I'll dig into this. Thanks for the help. If at some point you want to come back to this just drop me a note. I'm working with another company that would be very interested in taking a look at funding some development.

I know what it's like to be pulled 10 ways though. Thanks again for going above and beyond.

TechnikEmpire commented 8 years ago

@dkwiebe I'm actually considering what to put full time dev into right now, trying to pick 1 idea and run with it. Anyway that aside I got my directions wrong in that last bit. It actually goes like this:

For each json payload, do:

  1. Use a regex like [^a-zA-Z0-9\-\.] and replace all instances with a single white space.
  2. Now split the whole string by whitespace delimiter.
  3. Now check each substring to see if it's in the hashtable.

This works because sites like google, bing etc, although they copy porn thumbnails and serve them from their domains, they embed metadata in the JSON returned from the search query that reveals the origin of the copied image. No matter how that string is encoded, this process will extract it. Consider a json property like this:

OrigSource: "someStupidBase64EncodedString&r=http://redtube.com/thumbs/filth.png"

The regex will convert that to: OrigSource someStupidBase64EncodedString r http redtube.com thumbs filth.png

which when split by spaces will reveal "redtube.com" as a substring and bam you have your hit. Works very well give it a shot let me know if you have a chance.

TechnikEmpire commented 8 years ago

@dkwiebe Lastly I am interested in putting work into something like this. If there's an opportunity or an idea you have you can reach me at info@technikempire.com. I put two years into developing all this stuff, Stahpt It is represents the majority of it but I was heavily focused on blocking porn, yet omitted that stuff because those developments were the "gold" of the project per se. I had to drop it and just publish what I had of course because after 2 years, I had to get back to earning a cheque. haha anyway thanks again for your interest and all the best either way.