VirusTotal / vt-py

The official Python 3 client library for VirusTotal
https://virustotal.github.io/vt-py/
Apache License 2.0
535 stars 124 forks source link

Get the data as response from URL report #47

Closed hritik5102 closed 3 years ago

hritik5102 commented 3 years ago

Here is the code

import vt
client = vt.Client(<API KEY>)
analysis = client.scan_url('https://21stcenturywire.com/2021/04/07/texas-governor-signs-order-banning-use-of-vaccine-passports/', wait_for_completion=True)
print(analysis)

Output

analysis u-17cca8a680d8c2b04a044cc689cdb2dadde2b43abcc2edfe290f3cb552d49bbe-1619945963
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x0000017280919610>
Unclosed connector
connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x0000017280944220>, 6975.046)]']
connector: <aiohttp.connector.TCPConnector object at 0x0000017280919430>

But I want the result in terms of JSON to be added to my streamlit web app : https://developers.virustotal.com/reference#url-report

{
 'response_code': 1,
 'verbose_msg': 'Scan finished, scan information embedded in this object',
 'scan_id': '1db0ad7dbcec0676710ea0eaacd35d5e471d3e11944d53bcbd31f0cbd11bce31-1390467782',
 'permalink': 'https://www.virustotal.com/url/__urlsha256__/analysis/1390467782/',
 'url': 'http://www.virustotal.com/',
 'scan_date': '2014-01-23 09:03:02',
 'filescan_id': null,
 'positives': 0,
 'total': 51,
 'scans': {
    'CLEAN MX': {
      'detected': false, 
      'result': 'clean site'
    },
    'MalwarePatrol': {
      'detected': false, 
      'result': 'clean site'
    }
  }
}
hritik5102 commented 3 years ago

Hello @plusvic @chexca @mgmacias95 @aramirezmartin , can you help me to resolve this issue?

hritik5102 commented 3 years ago

I even use VirusTotol API V2 but it doesn't use the async approach, So I'm not able to see the live updates & I've to wait for the analysis to get complete.

Here is the code

import streamlit as st
import requests, os 

try: 
    url = 'https://www.virustotal.com/vtapi/v2/url/report'
    params = {'apikey': os.environ.get('VIRUS_TOTAL_API_KEY'), 'resource': user_input}
    response = requests.get(url, params=params)
    json_object = response.json()

    if json_object['scans'] is not None:
          scans = json_object['scans']
          print(scans)
    else:
           st.warning( "Couldn't able to get detect the site or Invalid URL provided !!")

except Exception as ec:
    st.info("The URL analysis is in progress, you will not see live updates, the results will appear all at once in at most 60 seconds.")
mgmacias95 commented 3 years ago

Hello @hritik5102,

Once you have your analysis object (returned from scan_url function) you can use its to_dict method:

analysis.to_dict()

I hope this helps.

Regards, Marta

hritik5102 commented 3 years ago

Hello @mgmacias95, Thank you for your quick response.

Issue no. 01

It did work but it giving me additional information about the analysis, which I don't want. The only thing I needed is the JSON data.

'MalSilo': {'category': 'harmless', 'result': 'clean', 'method': 'blacklist', 'engine_name': 'MalSilo'}, 'Nucleon': {'category': 'harmless', 'result': 'clean', 'method': 'blacklist', 'engine_name': 'Nucleon'}, 'BADWARE.INFO': {'category': 'harmless', 'result': 'clean', 'method': 'blacklist', 'engine_name': 'BADWARE.INFO'}, 'ThreatHive': {'category': 'harmless', 'result': 'clean', 'method': 'blacklist', 'engine_name': 'ThreatHive'}, 'FraudScore': {'category': 'harmless', 'result': 'clean', 'method': 'blacklist', 'engine_name': 'FraudScore'}, 'Tencent': {'category': 'harmless', 'result': 'clean', 'method': 'blacklist', 'engine_name': 'Tencent'}, 'Bfore.Ai PreCrime': {'category': 'harmless', 'result': 'clean', 'method': 'blacklist', 'engine_name': 'Bfore.Ai PreCrime'}, 'Baidu-International': {'category': 'harmless', 'result': 'clean', 'method': 'blacklist', 'engine_name': 'Baidu-International'}}}}

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x000001FB2543E670>
Unclosed connector
connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x000001FB254228E0>, 10162.593)]']
connector: <aiohttp.connector.TCPConnector object at 0x000001FB2543E490>

Issue no. 02

And another problem is that when I use scan_url_async method, it's throwing an error. I know what the error means but I don't how should i resolve it

Here is the code

import vt
client = vt.Client(os.environ.get('VIRUS_TOTAL_API_KEY'))
analysis = client.scan_url_async('https://21stcenturywire.com/2021/04/07/texas-governor-signs-order-banning-use-of-vaccine-passports/', wait_for_completion=True)
result =  analysis.to_dict() 
print(result)

Output

Traceback (most recent call last):
  File "src/test.py", line 361, in <module>
    result = analysis.to_dict()
AttributeError: 'coroutine' object has no attribute 'to_dict'
sys:1: RuntimeWarning: coroutine 'Client.scan_url_async' was never awaited

Thank you for your time, I hope I didn't bother you much.

hritik5102 commented 3 years ago

Issue no. 01 resolved, Actually I haven't closed the connection client.close().

Issue no. 02 is still alive :(

mgmacias95 commented 3 years ago

Hello @hritik5102,

Try doing

import vt
client = vt.Client(os.environ.get('VIRUS_TOTAL_API_KEY'))
analysis = await client.scan_url_async('https://21stcenturywire.com/2021/04/07/texas-governor-signs-order-banning-use-of-vaccine-passports/', wait_for_completion=True)
result =  analysis.to_dict() 

or

import vt
client = vt.Client(os.environ.get('VIRUS_TOTAL_API_KEY'))
analysis = client.scan_url('https://21stcenturywire.com/2021/04/07/texas-governor-signs-order-banning-use-of-vaccine-passports/', wait_for_completion=True)
result =  analysis.to_dict() 

The _async method use case would be launching multiple analyses in parallel and then fetching their results, something like this:

urls_to_scan = ['https://www.google.com/', 'https://www.virustotal.com/', 'https://github.com/']
futures = [client.scan_url_async(u) for u in urls_to_scan]
# do some other work here
for f in futures:
  result = await f
  print(result.to_dict())

If you're only analysing one URL, using scan_url vs scan_url_async has (almost) no performance improvements I'd say.

Take in mind await should be used inside async functions, to learn more check out the python docs

I hope this helps.

Regards, Marta

hritik5102 commented 3 years ago

Yes, it worked perfectly fine, Thanks a lot @mgmacias95 🤗

Now, both the issue has been resolved. Btw thanks for the advice, it really helps :)

Adding the source code for reference for future use, if someone facing a similar issue

import vt
import os
import asyncio
import nest_asyncio
nest_asyncio.apply()

async def hello(user_input):
    client = vt.Client(os.environ.get('VIRUS_TOTAL_API_KEY'))
    analysis = client.scan_url(user_input, wait_for_completion=True)
    result =  analysis.to_dict()
    client.close()
    return result['attributes']['results']

async def main():
    print("Started ...")
    json_data = await asyncio.create_task(hello('https://www.indiatoday.in/coronavirus-outbreak/story/chinese-president-xi-jinping-offers-help-to-india-in-fight-against-covid-19-1796738-2021-04-30'))
    print(json_data)
    print("Finished ...")

asyncio.run(main())