c-zhong / hdv2013

0 stars 0 forks source link

Investigation of the Host Status of Subnet 1 #8

Open movingname opened 11 years ago

movingname commented 11 years ago

The main goal is to find which hosts in subnet 1 are suspicious. Let's propose some sub tasks and some hypotheses here.

Data Summary

There are 371 IP address in subnet 1 (172.10..) and there are four groups.

junxzm commented 11 years ago

I have figured the different aspects of web-server in sub01.

One figure in every 20 minutes with time unified X-label. The number of figures for each creature is 62. And the number of report from web-server 01 is 8852. I am try to find out the abnormal status of this server.

Chen mentioned that this server is promising to be problematic. So I might as well first pay some attention to it.

Also Mingyi, do you have any clues for the host of Subnet 1?

movingname commented 11 years ago

Thanks! I have some clarification questions:

  1. You create a figure for the web-server every 20 minutes? Can you show one figure?
  2. What do you mean by creature?
  3. What do you mean by report?

Thanks!

I have not further the study of subnet 1 hosts yet. I will think some hypotheses first. If you want, you can edit the top post and add some hypotheses. Then I will try to visualize them.

junxzm commented 11 years ago

Please see my answers below. Thx.

Date: Fri, 24 May 2013 06:20:44 -0700 From: notifications@github.com To: hdv2013@noreply.github.com CC: junxzm@hotmail.com Subject: Re: [hdv2013] Investigation of the Host Status of Subnet 1 (#8)

Thanks! I have some clarification questions:

You create a figure for the web-server every 20 minutes? Can you show one figure?

Yeah. In our last meeting, Chen said that the problem might happen in just ten minutes. And an example is in the attachments. This is the first 20 minutes for numProcs of web server1 in subnet01. All the others are in the same way. The figures are dotted figures. I mean if the server reported its status in a certain time point, the status would appear in the figure. Otherwise the figure is blank in that time point.

What do you mean by creature? Sorry for the confusion. Feature here I mean the different status like loadAveragePercent, diskUsagePercent, etc.

What do you mean by report? Sorry for the confusion again. I mean in all the status items, the items from webserver01 in subnet1 count 8552.

Thanks!

I have not further the study of subnet 1 hosts yet. I will think some hypotheses first. If you want, you can edit the top post and add some hypotheses. Then I will try to visualize them.

¡ª Reply to this email directly or view it on GitHub.

movingname commented 11 years ago

Got it!

junxzm commented 11 years ago

Mingyi, I have did sth about the status of hosts in subnet 01, including servers and workstations. The results are as following: statuscount status 1 status 2 statusval 3 statusval 4

Explanations:

The first picture shows the number of reporting (here reporting means the items in the big brother data for a certain host);

The second to fifth pictures are the statusVal statistics for every host. The y-label value stands for the proportion of the statusVal (namely 1,2,3 or 4) in the total number of reporting for a certain host (For instance, in the picture for statusVal=1 and IP is IP1, the y-value of IP1 is obtained via dividing the number of statusVal=1 belonging to IP1 in the big brother data by the total number of reporting from IP1).

According to the picture for statusVal=3: There are two consecutive high level sections:

The first lower one is from

172.10.1.101 (WSS1-101.BIGMKT1.COM)

to

172.10.1.200 (WSS1-200.BIGMKT1.COM)

The second higher one is from 172.10.2.1 (WSS1-255.BIGMKT1.COM)

to

172.10.2.47(WSS1-300.BIGMKT1.COM)

According to the picture for statusVal=4:

There are three sections where the statusVal=4

But the most obvious one is the second section:

From

172.10.1.101 (WSS1-101.BIGMKT1.COM)

to

172.10.1.200 (WSS1-200.BIGMKT1.COM)

From this point, we can know that the obvious section in statusVal=4 is the same as the first section in status=3.

So here is the hypothesis:

One or several hosts in the subnet01 was/were infected, then it or they infect others.

Example, the ones in the second section of statusVal=3 infect the first section in the statusVal=3 (namely the section is statusVal=4) . Or in the opposite way.

How about your ideas?

junxzm commented 11 years ago

I am sorry I made a mistake about the last pictures for statusVal=4:

The right ones is as: statusval 4

movingname commented 11 years ago

This is very good! This helps us to find out suspicious hosts.

Some of my thoughts.

One row/report of big brother data stands for one service (cpu, conn, mem, disk, etc.) So each status report is linked with a service. It seems that the current visualization mixed all services together. Of course the current visualization is very valuable, but we have to think whether mixing them together is the best way.

What we've done not are mostly a summary of the one week data. The time information is completely lost. So we still cannot draw solid conclusions on our hypotheses. For example, your hypothesis is:

One or several hosts in the subnet01 was/were infected, then it or they infect others.

Let's assume that the hosts with high bars in picture 5 are first victims because most of the time they are unhealthy. However, other people would attack this conclusion by saying that: maybe these hosts always have problems (high cpu load, high disk load, etc.), so they are not interesting.

I will upload my pictures shortly.

Thanks!

junxzm commented 11 years ago

Sorry for the delay, cause I am in the train. And my apology for missing the weekly meeting.

It's true of your story. I ignored the time information.

However, the first task should be find those suspicious ones. Then we can narrow down our focus. And it seems that we do not have any idea of it. Otherwise we would be overwhelmed by the numerous hosts.

At least we can pay more attention to the hosts in picture five rather than the whole subnet.

Answer for your question about the host order: The host is ordered the same way as the document named ._BigMktNetwork.txt. Namely if you find the begin one in my description and the last one, then the middle ones in the document are the hosts in the sections of my pictures.

Also you mentioned the possibly fake IP address. Do you guys have further study?

Also have you ever find any port scan in the subnet01? If so, please tell me about the information. I am going to dig into the suspicious hosts in the pictures.

And once again, we can make the cooperation in this way: I am doing one task and others are doing other tasks. We can ask questions between each other. In the process of answering questions, we ourselves may somehow find valuable things.

Keep in contacts.

movingname commented 11 years ago

Sorry for the delay, cause I am in the train. And my apology for missing the weekly meeting.

It's true of your story. I ignored the time information.

I am not criticizing you:) Almost all of us ignored time information. In this meeting, Chen has some good ideas to add time information.

However, the first task should be find those suspicious ones. Then we can narrow down our focus. And it seems that we do not have any idea of it. Otherwise we would be overwhelmed by the numerous hosts.

Same as above.

At least we can pay more attention to the hosts in picture five rather than the whole subnet.

Answer for your question about the host order: The host is ordered the same way as the document named ._BigMktNetwork.txt. Namely if you find the begin one in my description and the last one, then the middle ones in the document are the hosts in the sections of my pictures.

Good!

Also you mentioned the possibly fake IP address. Do you guys have further study?

Not yet. We might consider to ask a question to the VAST committee.

Also have you ever find any port scan in the subnet01? If so, please tell me about the information. I am going to dig into the suspicious hosts in the pictures.

See https://github.com/crazyappleamy/hdv2013/issues/10

And once again, we can make the cooperation in this way: I am doing one task and others are doing other tasks. We can ask questions between each other. In the process of answering questions, we ourselves may somehow find valuable things.

This is a good way. But as we move to the critical phase, this way of collaboration is uncontrollable. So our current decision is that we should work together to create a main document, and then concentrate our efforts on some key tasks. We will also make a time schedule. Of course, we can further discuss on how to collaborate. Cause all of us are learners.

Keep in contacts.

movingname commented 11 years ago

I have created some simple timeline graphs for a rough understanding of the network situation overtime.

How these graph is created?

  1. Each host has several services, and each service has a level status from 1 ~ 4. I only plot the CPU, Mem and Disk services and I ignored status 1. So I will plot one graph for each service. And in each graph there are three lines represents the number of reports of status 2, status 3 and status 4.
  2. I plot the whole network graph and the subnet 1 graph for each service. And the data of two weeks are separated. So in total we have 12 graphs.

Graphs

Week 1

image

image

image

Week 2

image

image

image

3. Analysis

3.1 It shows a general time trends of the host situation.

3.2 The pattern of mem and disk are similar. So we might be able to combine these two. This could reduce the difficulty for visualization.

Thanks!

junxzm commented 11 years ago

Great and thx. Haha you are too official to say "I am not criticizing you". I like it.

Some questions:

First: Are we still focused on the subnet01? Or we have been working on the total network?

This is my understanding of the current situation: no matter which parts we focus, our current visualization analysis is not sufficient. The key issue is to show the time dimension.

Second: How about the second week's data? Should we start it now?

Yes. We can work on that, the above graph has week 2 data.

Third: Could you please give me the information about the internal and external IP match?

I think the week 2 data has a mapping?

Four: What's our main task now?

First, I would say we can freely work on anything we are interested in:)

Second, the key tasks are at least three folds:

1. Write a main document that summarizes all our current progresses and ideas. Chen is working on that and each of us will contribute to it. See https://docs.google.com/document/d/1pFkTzM2SVmOiQWYd1glozPcwaUIcmGcvOj1a44YJ-p0/edit

2. Discuss and design a main visualization tool. Chen has some ideas and I (or Chen) will draw it in a Google Doc and let everybody discuss.

3. If you all agree that the theme of our project is collaborative hypotheses-based visual analytic, we should then do some research and see how can we find innovative ideas on this theme.

Five: As for the fake IP, do you have any idea about the details (for example, how is it generated? when did it start to appear?).

Not yet. So the thing is, for host data, we do not need to care about it because it has the hostname which is always correct. For the network flow data which does not have a hostname field, I am not sure the impact. Maybe Gaoyao can answer it because he is more familiar with that data. I will send him a request.

Six: Now it seems a story is going like this: Some attacker or attackers started the port scan. Then some weak ports were found. The illegal things happened such as fake ip generation. Then some infection appeared. Then some thing wrong happened to the network.

There is a debate on whether the fake ip generation is part of the attack or just a problem of the data. For example, why the attacker wants to fake an ip? I incline to the second explanation, but we probably need to ask the VAST people.

In general, your story (or hypothesis) is probably the truth. But what we need is to have the most detailed story line, and this can only be achieved by a better visualization tool.

What's your idea?

Thanks.

junxzm commented 11 years ago

Great, this is sth I want to do next. You seemed to have done this task. Save me a lot trouble.

I will look into them. Thanks.

movingname commented 11 years ago

I've updated the graphs and I've provided my answers to your questions.

Let's work together to get the award and get some papers published!

junxzm commented 11 years ago

OK, got it. Thanks.

junxzm commented 11 years ago

I think I have some ideas about how to make a story in collaborative hypotheses-based idea. I will try to make it in another train.

Thanks.

movingname commented 11 years ago

Great. If you think your idea is good, probably you should use private channels like email to discuss. Remember that this github is public...

junxzm commented 11 years ago

I will never upload my idea in the github. If done, the document would appear on the Goolge Drive. Thanks for remaiding. Date: Tue, 28 May 2013 20:27:05 -0700 From: notifications@github.com To: hdv2013@noreply.github.com CC: junxzm@hotmail.com Subject: Re: [hdv2013] Investigation of the Host Status of Subnet 1 (#8)

Great. If you think your idea is good, probably you should use private channels like email to discuss. Remember that this github is public...

¡ª Reply to this email directly or view it on GitHub.